MCP Server Security: Vet Any Server Before You Install
MCP server security for the people who install them: a four-gate vetting routine and the real package names, so a trojaned skill can't run as your agent.
>This is the install-time checklist. AI Agent Security shows where the vetted server still runs as your agent, and how the sandbox contains it.

AI Agent Security: Lock Down Claude Code, MCP Servers & OpenClaw
Stop Prompt Injection, Scope Your Credentials, and Ship Agents That Can't Be Owned
Summary:
- Why installing an MCP server is handing it your agent’s permissions.
- The four gates to run before every install, with a completed reject worksheet.
- The real npm token-stealer that a clean GitHub repo hid.
- The exact official MCP package names, so a typosquat can’t undo your vetting.
MCP server security is the one discipline that can undo every other defense in a single command. You run npm install on a helpful-looking skill, or claude mcp add on some server, and a stranger’s code is now living inside every wall you built, running with your agent’s permissions, reading whatever your agent can read. The locks are all still on the doors. You just handed someone a key and walked them in.
This is not paranoia, it’s a base rate. A 2026 study (arXiv:2601.10338) analyzed 31,132 agent skills and found 26.1% had at least one security vulnerability. More than one in four. So the question isn’t whether to vet what you install. It’s how, fast, on every install. Here’s the routine.
What is MCP server security, really?
It’s vetting executable trust before you install it. When you connect a skill, plugin, or MCP server, you’re not sandboxing a dependency your code calls in a controlled way. You’re handing a tool to your agent, which calls it with your agent’s access, steered by whatever input your agent reads. If your agent can read files, the skill can read files. If your agent holds a brokered token, the skill runs in the process that requests it.
Every defense in agent hardening protects you from the agent being tricked. Almost none protect you from a tool that was malicious to begin with, because that tool isn’t being aimed at your capabilities, it brought its own intentions and inherited your access to act on them. So the work is to inspect the tool before it’s inside the walls.

The four gates: run them before every install
You don’t have to audit every line of every dependency. You need four gates, and a real trojan trips at least two of them:
Run before every skill / plugin / MCP server install:
[ ] 1. PROVENANCE does the registry artifact match the audited source?
(a clean GitHub repo is NOT a clean npm package)
[ ] 2. PERMISSION circle every requested permission the one-sentence
purpose does NOT explain
[ ] 3. STATIC SCAN run a scanner: NVIDIA SkillSpector, or the ATR scanner
(npm agent-threat-rules / PyPI pyatr)
[ ] 4. SANDBOX detonate it in the box with egress on; watch what it
actually does on a real run
Gate two is the one that’s free and catches the most. Write down the one sentence of what the tool is for, then write down every permission it requests, and circle every permission that sentence doesn’t explain. A text formatter that wants the network has no innocent reason to want the network. Malice hides in the gap between purpose and access. (On the scanner: the Agent Threat Rules project is a young, single-maintainer open-source tool, useful as a runnable scanner, not an industry standard. Frame its results as “this rule set flagged X,” never “the standard says X.”)
The completed worksheet: markdown-pretty-printer
The gates are only worth something if they produce a written verdict you can point at. Here is the routine run to completion on a skill called “Markdown Pretty-Printer.” Its job is to reformat text. Read the permission rows and the verdict writes itself:
PACKAGE: markdown-pretty-printer
SOURCE: npmjs.com/package/markdown-pretty-printer
CLAIMED PURPOSE: format local markdown
PUBLISHER: account 3 weeks old, only package [gate 1 PROVENANCE: FAIL]
ARTIFACT MATCHES SOURCE: no (npm tarball not reproducible from the GitHub repo)
REQUESTED PERMISSIONS: [gate 2 PERMISSION: FAIL]
read current file ........ JUSTIFIED (it formats the file)
write current file ....... JUSTIFIED (it writes the result)
read ~/.codex/auth.json .. NOT JUSTIFIED (a formatter has no business here)
outbound HTTPS ........... NOT JUSTIFIED (a formatter needs no network)
STATIC SCAN: reads a credential path; POSTs to a telemetry-shaped endpoint [gate 3: WARN]
SANDBOX TRIAL: tried to reach that endpoint; blocked by the egress allowlist [gate 4: WARN]
VERDICT: REJECT (purpose explains 2 of 4 permissions; the other 2 are the attack)
Two of the four requested permissions, reading ~/.codex/auth.json and outbound HTTPS, have nothing to do with formatting markdown. Those two are the attack. The worksheet’s whole value is that it forces the comparison, what the tool is for against what it asks for, and a reject becomes obvious instead of agonized.
The attack this catches: codexui-android
That worksheet isn’t hypothetical. Aikido uncovered a malicious npm package, codexui-android, that had reached roughly 29,000 weekly downloads. The mechanics are a master class. The public GitHub repository stayed clean for about a month, building stars and trust. Then a malicious update was published only to the npm registry, not to GitHub. Anyone who glanced at the source saw clean code; anyone who ran npm install got the poisoned artifact. On run, it read ~/.codex/auth.json, pulled out the access token and the unexpiring refresh token, and exfiltrated them to a server dressed up to look like Sentry so the traffic blended into telemetry. One associated app in the campaign was named “OpenClaw Codex Claude AI Agent,” wrapping three trusted names into one piece of bait.
Two lessons land hard. A clean GitHub is not a clean package, because the thing you audit (the repo) and the thing you run (the registry artifact) can differ, which is exactly what gate one checks. And the long-lived secret on disk is the prize, which is why a context holding only a five-minute brokered token is a far worse target than a ~/.codex/auth.json full of unexpiring ones. (Keep this distinct from a separate, also-real campaign that ANY.RUN reported: fake Codex and Claude installer pages hosted on Google Sites. Different attack, malicious delivery page versus malicious registry artifact. Don’t conflate them.)
Get the package name exactly right
A vetting routine that ends in installing the package one character off the real one ends in a compromise anyway. Typosquatting lives in the gap between the name you remember and the name that’s real. Here are the official servers, verified live on the registries:
| Package (exact scoped name) | Registry | Status |
|---|---|---|
@modelcontextprotocol/server-filesystem | npm | Real, latest 2026.1.14 |
@modelcontextprotocol/server-memory | npm | Real (scoped) |
@modelcontextprotocol/server-sequential-thinking | npm | Real (note the hyphen) |
mcp-server-fetch, mcp-server-git, mcp-server-time | PyPI | Real (these are PyPI, not npm) |
@anthropic-ai/mcp-server-* | npm | Does not resolve to a package |
Source: npm, @modelcontextprotocol/server-filesystem. Two traps to memorize: the no-hyphen server-sequentialthinking is a 404 and a typosquatter’s dream, and the entire @anthropic-ai/mcp-server-* family that people assume exists does not. Some official servers live on npm under the @modelcontextprotocol scope; others (fetch, git, time) are PyPI packages. Install from the wrong registry or the wrong scope and the four gates above never even ran.
Reading a scanner’s output
Gate three produces findings, and the trap is treating them as verdicts. Treat them as leads, triaged by what they’d let an attacker do:
Treat scanner results as leads, not verdicts:
HIGH: credential-file read, eval of remote input -> inspect now, assume guilty
MEDIUM: network call, broad filesystem glob -> compare against stated purpose
LOW: noisy dependency, overbroad but unused perm -> tighten or sandbox, don't panic
A clean scan is not a clean bill of health (scanners miss novel attacks) and a flagged scan is not a conviction (scanners false-positive constantly). For every HIGH, go look at the code it points to: a HIGH on a ~/.codex/auth.json read in a markdown formatter is your reject; the same read in an actual authentication helper might be the whole point of the tool. Severity tells you where to look first, not what to conclude.
What should you actually do?
- Before any install → run the four gates. Provenance and permission audit are free and catch most of it; do those two even when you’re rushed.
- For a closed-source skill or remote server you can’t read → you can’t run the static scan on a black box, so lean harder on the gates you can: tighten permissions to the minimum, keep it permanently sandboxed with a strict egress allowlist, and treat its every output as untrusted.
- When a tool is both unauditable AND demands broad access → that combination is your answer. Don’t install it. “I couldn’t check but I trusted it anyway” is the sentence that precedes most breaches.
- Keep the checklist where you install → the gate only works if it stands between you and the
installcommand, not in a doc you read once.
The bottom line
- Installing a server is granting it your agent’s access. Treat
npm installandclaude mcp addas security events, because they are. - The loudest signal of a malicious tool is free to check: a permission its stated purpose can’t explain. Circle the gap, and the reject writes itself.
- A clean repo is not a clean package, and the right name is not the name you half-remember. Verify the artifact and the exact scope, or the vetting was theater.
Frequently Asked Questions
How do you check if an MCP server is safe?+
Run four gates before you install: verify the registry artifact matches the audited source, audit requested permissions against the stated purpose, run a static scanner, and detonate it in a sandbox with egress locked down. A permission the purpose can't explain is your reject.
What is the biggest MCP server security risk?+
A skill or server that requests access its stated job doesn't need. Malice hides in the gap between purpose and permissions. A measured 26.1% of analyzed servers request permissions that don't match their stated purpose.
Are MCP servers safe to install from npm?+
Only after vetting, and only from the exact official scope. A clean GitHub repo is not a clean npm package, and typosquatted names one character off the real one are a common attack. Verify the scoped name before you run install.