OWASP LLM Top 10: A Practical Defense Guide

Why the OWASP LLM Top 10 matters more for agents

The original OWASP LLM Top 10 was written with chatbot-style applications in mind — a user sends a message, the model responds with text. The blast radius of an attack is limited to the conversation. But autonomous agents are different. When an attack succeeds against an agent, the attacker doesn't just get bad text output — they get code execution, file system access, API calls, and potentially full system compromise.

Clawmont's four security pillars were designed against this exact threat model. Let's walk through each OWASP item and see how it maps to real agent attacks — and real defenses.

The ten risks, applied to autonomous agents

LLM01 — Prompt Injection

The risk: An attacker manipulates the LLM through crafted inputs that cause the model to execute unintended actions. In agents, this is catastrophic: the "unintended action" isn't wrong text — it's a shell command, a file deletion, or a credential theft.

Agent-specific danger: Indirect prompt injection is especially dangerous. The agent reads a file, a web page, or an API response that contains adversarial instructions. The model follows those instructions because it can't distinguish data from commands.

Defense: Clawmont's input rail scans all incoming text — user prompts and tool outputs — for known injection patterns, semantic manipulation, and obfuscated payloads. The detection corpus includes 2,300+ scenarios from real-world red-team exercises. Try it live.

LLM02 — Insecure Output Handling

The risk: LLM output is used downstream without validation, enabling XSS, SSRF, or code injection in connected systems.

Agent-specific danger: Agents routinely pass model output into tool calls. The model generates a shell command, and the tool executor runs it. The model generates a file path, and the tool writes to it. Every tool call is an output-handling boundary.

Defense: Clawmont's tool dispatch pillar validates every tool call before execution. Dangerous commands are blocked. Known attack patterns — like curl to external hosts with local credential reads — are caught at the semantic level, not just string matching.

LLM03 — Training Data Poisoning

The risk: Manipulated training data causes the model to have embedded vulnerabilities, biases, or backdoors.

Agent-specific danger: Agents often use retrieval-augmented generation (RAG) with live data sources. Poisoning the retrieval corpus is the agentic equivalent of training data poisoning — and it's much easier because the data is often user-controlled.

Defense: Runtime detection. Even if the model's behavior has been influenced by poisoned data, the security layer catches dangerous outputs at the tool-call boundary. Clawmont's tool response pillar screens what comes back from external sources before it reaches the model.

LLM04 — Model Denial of Service

The risk: Resource-heavy prompts or recursive loops exhaust compute budget.

Agent-specific danger: Agents can be tricked into infinite loops — read a file that says "now read this other file," which says "now read the first file again." Without loop detection, this burns API credits until the budget is exhausted. Clawmont's input rail includes heuristics for recursive patterns and context-bombing attempts.

LLM05 — Supply Chain Vulnerabilities

The risk: Compromised plugins, models, or dependencies introduce vulnerabilities.

Agent-specific danger: Agents use plugins and tool packages from package registries. A compromised npm package used by an agent's tool can exfiltrate data or execute arbitrary code. Clawmont runs with minimal runtime dependencies — only @clack/prompts — to minimize supply chain risk. The plugin's 2,700+ tests verify behavior on every commit.

LLM06 — Sensitive Information Disclosure

The risk: The LLM reveals confidential data in its responses.

Agent-specific danger: This is one of the most critical risks for agents. The model has access to your environment variables, SSH keys, API tokens, and database credentials through its tool calls. A successful prompt injection can instruct the agent to include these in its output, log them to a file, or send them to an external endpoint.

Defense: Clawmont's output verification pillar scans the model's final response for credential patterns, API key formats, private key markers, and other sensitive data before it's shown to the user or passed to the next tool call. Credentials never leave the machine — Clawmont processes everything locally.

LLM07 — Insecure Plugin Design

The risk: Plugins with excessive permissions or inadequate input validation become attack vectors.

Agent-specific danger: In the MCP (Model Context Protocol) ecosystem, tools are essentially plugins that the model can invoke. Each tool's permissions define the agent's blast radius. Clawmont sits between the model and every tool call, enforcing security checks regardless of the tool's own design. Even a poorly designed tool is safer behind Clawmont's dispatch layer.

LLM08 — Excessive Agency

The risk: LLM-based systems have too much autonomy, performing actions beyond their intended scope.

Agent-specific danger: This is the defining risk of autonomous agents. The whole point is that they act without human approval for each step. Clawmont provides guardrails within that autonomy — the agent can still operate freely within safe boundaries, but dangerous actions are blocked and logged.

LLM09 — Overreliance

The risk: Users trust LLM output without verification, leading to misinformation, security vulnerabilities in generated code, or legal liability.

Agent-specific danger: When an agent writes and commits code, deploys infrastructure, or modifies configuration, the "overreliance" isn't just accepting bad text — it's shipping vulnerabilities to production. Clawmont's audit trail logs every tool call and model decision, making it possible to review what the agent did and why.

LLM10 — Model Theft

The risk: Unauthorized access to or exfiltration of proprietary LLM models.

Agent-specific danger: For self-hosted models, agents with broad file system access could be tricked into reading and exfiltrating model weights. For API-based models, prompt extraction attacks can reveal system prompts and few-shot examples. Clawmont's input and output rails catch exfiltration patterns regardless of what's being exfiltrated — model weights or credentials.

Putting it all together

The OWASP LLM Top 10 provides a useful framework for thinking about LLM security, but autonomous agents amplify every risk on the list. The key insight is that agents turn text-generation risks into code-execution risks. A defense strategy that only addresses the model layer — prompt hardening, output filtering — misses the critical tool-call boundary where real damage happens.

Clawmont's four-pillar approach addresses the full attack lifecycle: input, tool dispatch, tool response, and output. Each pillar is independently tested against the OWASP risks, and detection rates are published transparently — including bypasses.

See the four pillars in action

Try real OWASP attack patterns against Clawmont's live detection engine.

Security Playground Security deep dive