Securing MCP Servers in Agentic AI

The Model Context Protocol (MCP) lets large language models interact with external tools, turning them from passive text generators into active agents. The promise is huge—agents can fetch data, trigger workflows and orchestrate complex tasks. However, each MCP connection creates a bridge between untrusted model outputs and sensitive systems: a single weak point (a poisoned prompt, an over‑broad permission or an unverified tool) can be abused. Despite the availability of an MCP specification, most reference servers implement only the basic transport and authorization flows. They omit crucial hardening such as verification of incoming requests, secure session identifiers and tooling policies. As we deploy agentic AI more widely, we need a security layer that goes beyond protocol compliance.

This article distills the major risks facing MCP deployments, highlights existing best practices from the protocol specification, and proposes a configuration‑first framework that lets operators add strong controls without rewriting their servers. The goal is a secure foundation that still unlocks the potential of agentic AI.

Why Standard MCP Servers Fall Short

The MCP draft security document warns that servers must verify every inbound request, avoid using sessions for authentication and generate non‑deterministic session identifiers. It also notes that local MCP servers running on a user’s machine can be compromised via malicious startup commands, DNS‑rebinding or hidden code in downloaded binaries. The document recommends clear consent dialogs, restricted filesystem access and sandboxed execution to mitigate these risks. Unfortunately, the reference implementations that many teams adopt often implement the protocol but omit these mitigations. They proxy tokens to downstream APIs, log sensitive inputs, or expose unvetted tools. For example, researchers have shown how prompt‑injection attacks can trick assistants into ignoring safety instructions and exfiltrating secrets. In May 2025 a critical SQL‑injection flaw was discovered in a popular SQLite‑based MCP server that had already been forked thousands of times—a stark reminder of supply‑chain risks.

The fundamental problem is that an MCP server acts as a trusted interpreter between a model and the real world. Without strong controls, malicious inputs or misconfigured tools can lead to data loss, code execution or privilege escalation. To build trust, we need to wrap the server with policy gates that are easy to configure and audit.

Major Threat Categories

Before designing defences, it helps to understand the threats unique to agentic AI. The table below summarises the principal risk categories and high‑level mitigations drawn from recent analyses and best‑practice guides:

Prompt/indirect injection: Attackers trick the model with embedded instructions.

Mitigation: Validate inputs, use allow-lists, and require human approval for risky operations.

Code/command injection: Untrusted strings allow arbitrary code execution.

Mitigation: Sanitize inputs (escape characters), and run tools in restricted containers.

Token/credential leakage: Improperly stored tokens allow attackers to impersonate clients.

Mitigation: Use short-lived tokens, avoid logging secrets, and store tokens securely.

Excessive permissions: Overly broad scopes give tools too much access.

Mitigation: Apply least-privilege scopes and audit permissions in CI/CD.

Supply-chain attacks: Malicious dependencies compromise servers.

Mitigation: Pin package versions, use signed registries, and scan for vulnerabilities.

Local server compromise: Untrusted local servers can run arbitrary commands.

Mitigation: Display commands before execution, use sandboxes, and restrict network/filesystem access.

These threats underscore the need for comprehensive defences that operate at multiple stages of the tool lifecycle.

Two Layers of Protection: Registration and Execution

To reduce risk without stifling productivity, we propose a two‑layer security model for MCP servers. Instead of embedding all logic in the server code, operators define policies and manifests in configuration files. A lightweight policy engine then enforces these rules at registration time and at each tool call. This approach allows teams to adopt strong safeguards with minimal changes to existing infrastructure.

1 Registration‑Time Policy (Gate to Production)

Before a tool is made available to models, it must pass through a registration gate. Each tool publishes a manifest describing its capabilities, scopes, allowed data classes, network destinations and security requirements. During CI/CD, a policy evaluator checks this manifest for issues such as:

Least‑privilege scopes – Tools should declare only the operations they need. Read and write scopes must be separated; mixing them in one token is forbidden
Network egress allow‑lists – The manifest must specify allowed domains and ports; wildcards are denied by default.
Data‑handling rules – Tools that process personal data must declare redaction rules and classification levels. PII should be redacted or hashed before output.
Supply‑chain hygiene – Each tool must include a software bill of materials (SBOM); the CI pipeline scans for known CVEs and rejects unsigned dependencies.
Red‑team tests – Operators maintain a suite of malicious prompts (jailbreaks, injection patterns) and sample inputs. Tools must demonstrate resistance to these prompts before being approved for production.

Automated checks enforce these rules consistently. Tools that fail are never published, preventing “weaponised” functions from entering the system.

2 Execution‑Time Policy (Every Call)

Even well‑designed tools can misbehave in the presence of adversarial inputs. Execution‑time policies apply to every invocation to enforce least privilege and detect data exfiltration. Typical controls include:

Identity and scope validation – Each call is bound to the caller’s identity and scopes; calls lacking required scopes are rejected.
Input sanitisation – Incoming arguments are validated against JSON Schema or Protobuf definitions. Strings are escaped; payloads are limited in length
Sandboxed execution – Tools run in isolated containers or virtual machines with read‑only filesystems, CPU/memory quotas and network egress filtered through a proxy. This limits the blast radius of any compromise.
Data‑loss prevention (DLP) – Outputs are scanned for secrets, personal data or blocked phrases. PII is redacted according to the tool’s manifest; responses containing disallowed patterns (e.g., private keys, API tokens) are blocked.
Human approval for high‑risk actions – Tools that modify data or perform external side‑effects must collect explicit user confirmation or satisfy policy conditions (time windows, dry‑run first) before executing.
Audit and observability – All prompts, tool invocations, decisions and outputs are logged with caller identity and timestamps. Logs feed into SIEM/SOC pipelines for monitoring and forensic analysis.

By separating registration and execution policies, we can catch misconfigurations early and still enforce granular controls on each call. The policies live in configuration (YAML/JSON and Rego rules) and can be updated without touching server code, making the system adaptable to new threats.

Building a Config‑First Security Framework

Putting these ideas into practice involves a few concrete steps:

Define a manifest schema – Create a YAML or JSON schema that all tools must follow. Required fields include name, version, scopes, network allow‑list, data classification, redaction rules and SBOM pointer.
Adopt policy‑as‑code – Use an engine like Open Policy Agent (OPA) to express registration and execution rules. Policies can deny tools with mixed read/write scopes, wildcard egress or missing SBOMs, and enforce DLP at runtime.
Integrate into CI/CD – During build and deployment, validate tool manifests, run red‑team tests and scan dependencies. Reject any tool that fails policy checks.
Instrument the middleware – Add a small middleware between the MCP server and the tool execution environment. It loads the execution policy, validates the call, runs the tool in a sandbox and applies post‑processing (redaction, DLP) before returning the response.
Maintain a policy registry – Host global and per‑tool configuration files in a version‑controlled repository. Security and compliance teams can update policies quickly as new threats emerge.

This approach transforms security from an afterthought into a first‑class configuration artefact. Developers still write tools and models as before, but the platform enforces consistency and safety through clear policies.

Conclusion

Agentic AI systems built on MCP promise to revolutionize productivity. Yet with great power comes great responsibility. The standard specification provides essential protocol and authorization mechanics, but it does not enforce secure session handling, tool vetting or runtime data‑loss prevention Recent incidents—including prompt‑injection demonstrations and supply‑chain vulnerabilities—highlight the real risks.

To deploy MCP safely at scale, organizations should adopt a config‑first security framework. By vetting tools at registration, enforcing granular policies during execution and maintaining comprehensive logs, we can prevent many of the attacks that plague naive deployments. Policies expressed as code allow security teams to adapt quickly, while developers focus on innovation. In short, secure configuration—not wishful thinking—will determine whether agentic AI becomes a trusted assistant or a liability.

References

Model Context Protocol (MCP) Specification – Security Best Practiceshttps://modelcontextprotocol.io/specification/draft/basic/security_best_practices
WorkOS Blog – The Complete Guide to MCP Security: How to Secure MCP Servers & Clientshttps://workos.com/blog/mcp-security-risks-best-practices
OWASP Top 10 for Large Language Models (LLM Security Risks)https://owasp.org/www-project-top-10-for-llms/
NIST AI Risk Management Framework (AI RMF 1.0, January 2023)https://www.nist.gov/itl/ai-rhisk-management-framework

Search This Blog

AI Transformation