AgentShield

Security auditor for AI agent configurations

Scans Claude Code setups for hardcoded secrets, permission misconfigs,
hook injection, MCP server risks, and agent prompt injection vectors.

Quick Start · What It Catches · Opus Pipeline · GitHub Action · MiniClaw

Why

The AI agent ecosystem is growing faster than its security tooling. In January 2026 alone:

12% of a major agent skill marketplace was malicious (341 of 2,857 community skills)
A CVSS 8.8 CVE exposed 17,500+ internet-facing instances to one-click RCE
The Moltbook breach compromised 1.5M API tokens across 770,000 agents

Developers install community skills, connect MCP servers, and configure hooks without any automated way to audit the security of their setup. AgentShield scans your .claude/ directory and flags vulnerabilities before they become exploits.

Built at the Claude Code Hackathon (Cerebral Valley x Anthropic, Feb 2026). Part of the Everything Claude Code ecosystem (42K+ stars).

Quick Start

# Scan your Claude Code config (no install required)
npx ecc-agentshield scan

# Or install globally
npm install -g ecc-agentshield
agentshield scan

That's it. AgentShield auto-discovers your ~/.claude/ directory, scans all config files, and prints a graded security report.

  AgentShield Security Report

  Grade: F (0/100)

  Score Breakdown
  Secrets        ░░░░░░░░░░░░░░░░░░░░ 0
  Permissions    ░░░░░░░░░░░░░░░░░░░░ 0
  Hooks          ░░░░░░░░░░░░░░░░░░░░ 0
  MCP Servers    ░░░░░░░░░░░░░░░░░░░░ 0
  Agents         ░░░░░░░░░░░░░░░░░░░░ 0

  ● CRITICAL  Hardcoded Anthropic API key
    CLAUDE.md:13
    Evidence: sk-ant-a...cdef
    Fix: Replace with environment variable reference [auto-fixable]

  ● CRITICAL  Overly permissive allow rule: Bash(*)
    settings.json
    Evidence: Bash(*)
    Fix: Restrict to specific commands: Bash(git *), Bash(npm *), Bash(node *)

  Summary
  Files scanned: 6
  Findings: 73 total — 19 critical, 29 high, 15 medium, 4 low, 6 info
  Auto-fixable: 8 (use --fix)

More commands

# Scan a specific directory
agentshield scan --path /path/to/.claude

# Auto-fix safe issues (replaces hardcoded secrets with env var references)
agentshield scan --fix

# JSON output for CI pipelines
agentshield scan --format json

# Generate an HTML security report
agentshield scan --format html > report.html

# Three-agent Opus 4.6 adversarial analysis (requires ANTHROPIC_API_KEY)
agentshield scan --opus --stream

# Generate a secure baseline config
agentshield init

What It Catches

102 rules across 5 categories, graded A–F with a 0–100 numeric score.

Secrets Detection (10 rules, 14 patterns)

What	Examples
API keys	Anthropic (`sk-ant-`), OpenAI (`sk-proj-`), AWS (`AKIA`), Google (`AIza`), Stripe (`sk_test_`/`sk_live_`)
Tokens	GitHub PATs (`ghp_`/`github_pat_`), Slack (`xox[bprs]-`), JWTs (`eyJ...`), Bearer tokens
Credentials	Hardcoded passwords, database connection strings (postgres/mongo/mysql/redis), private key material
Env leaks	Secrets passed through environment variables in configs, `echo $SECRET` in hooks

Permission Audit (10 rules)

What	Examples
Wildcard access	`Bash()`, `Write()`, `Edit(*)` — unrestricted tool permissions
Missing deny lists	No deny rules for `rm -rf`, `sudo`, `chmod 777`
Dangerous flags	`--dangerously-skip-permissions` usage
Mutable tool exposure	All mutable tools (Write, Edit, Bash) allowed without scoping
Destructive git	`git push --force`, `git reset --hard` in allowed commands
Unrestricted network	`curl `, `wget`, `ssh `, `scp *` in allow list without scope

Hook Analysis (34 rules)

What	Examples
Command injection	`${file}` interpolation in shell commands — attacker-controlled filenames become code
Data exfiltration	`curl -X POST` with variable interpolation sending data to external URLs
Silent errors	`2>/dev/null`, `\|\| true` — failing security hooks that silently pass
Missing hooks	No PreToolUse hooks, no Stop hooks for session-end validation
Network exposure	Unthrottled network requests in hooks, sensitive file access without filtering
Session startup	SessionStart hooks that download and execute remote scripts
Package installs	Global `npm install -g`, `pip install`, `gem install`, `cargo install` in hooks
Container escape	Docker `--privileged`, `--pid=host`, `--network=host`, root volume mounts
Credential access	macOS Keychain, GNOME Keyring, /etc/shadow reads
Reverse shells	`/dev/tcp`, `mkfifo + nc`, Python/Perl socket shells
Clipboard access	`pbcopy`, `xclip`, `xsel`, `wl-copy` — exfiltration via clipboard
Log tampering	`journalctl --vacuum`, `rm /var/log`, `history -c` — anti-forensics

MCP Server Security (23 rules)

What	Examples
High-risk servers	Shell/command MCPs, filesystem with root access, database MCPs, browser automation
Supply chain	`npx -y` auto-install without confirmation — typosquatting vector
Hardcoded secrets	API tokens in MCP environment config instead of env var references
Remote transport	MCP servers connecting to remote URLs (SSE/streamable HTTP)
Shell metacharacters	`&&`, `\|`, `;` in MCP server command arguments
Missing metadata	No version pin, no description, excessive server count
Sensitive file args	`.env`, `.pem`, `credentials.json` passed as server arguments
Network exposure	Binding to `0.0.0.0` instead of localhost
Auto-approve	`autoApprove` settings that skip user confirmation for tool calls
Missing timeouts	High-risk servers without timeout — resource exhaustion risk

Agent Config Review (25 rules)

What	Examples
Unrestricted tools	Agents with Bash access, no `allowedTools` restriction
Prompt injection surface	Agents processing external/user-provided content without defenses
Auto-run instructions	`CLAUDE.md` containing "Always run", "without asking", "automatically install"
Hidden instructions	Unicode zero-width characters, HTML comments, base64-encoded directives
URL execution	`CLAUDE.md` instructing agents to fetch and execute remote URLs
Time bombs	Delayed execution instructions triggered by time or absence conditions
Data harvesting	Bulk collection of passwords, credentials, or database dumps
Prompt reflection	`ignore previous instructions`, `you are now`, DAN jailbreak, fake system prompts
Output manipulation	`always report ok`, `remove warnings from output`, suppress security findings

Features

Auto-Fix Engine (`--fix`)

Automatically applies safe fixes:

Replaces hardcoded secrets with ${ENV_VAR} references
Tightens wildcard permissions (Bash(*) → scoped Bash(git *), Bash(npm *))

Only fixes marked auto: true are applied. Permission changes require human review.

Secure Init (`agentshield init`)

Generates a hardened .claude/ directory with scoped permissions, safety hooks, and security best practices. Existing files are never overwritten.

Opus 4.6 Deep Analysis (`--opus`)

Three-agent adversarial pipeline powered by Claude Opus 4.6:

Red Team (Attacker) — finds exploitable attack vectors and multi-step chains
Blue Team (Defender) — evaluates existing protections and recommends hardening
Auditor — synthesizes both perspectives into a prioritized risk assessment

The Attacker finds that curl hooks with ${file} interpolation + Bash(*) = command injection pivot. The Defender notes no PreToolUse hooks exist to stop it. The Auditor chains them into a prioritized action list.

agentshield scan --opus              # Red + Blue run in parallel
agentshield scan --opus --stream     # Sequential with real-time output
agentshield scan --opus --stream -v  # Verbose — see full agent reasoning

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 1a: ATTACKER (Red Team)                       ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  ✓ Attacker analysis complete (4521 tokens)

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 1b: DEFENDER (Blue Team)                      ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  ✓ Defender analysis complete (3892 tokens)

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 2: AUDITOR                                    ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  Risk Level: CRITICAL
  Opus Score: █████░░░░░░░░░░░░░░░ 15/100

Requires ANTHROPIC_API_KEY environment variable.

Output Formats

Format	Flag	Use Case
Terminal	`--format terminal` (default)	Interactive use
JSON	`--format json`	CI pipelines, programmatic access
Markdown	`--format markdown`	Documentation, PRs
HTML	`--format html`	Self-contained shareable report (dark theme, all CSS inlined)

GitHub Action

- name: AgentShield Security Scan
  uses: affaan-m/agentshield@v1
  with:
    path: "."
    min-severity: "medium"
    fail-on-findings: "true"

Inputs:

Input	Default	Description
`path`	`.`	Path to scan
`min-severity`	`medium`	Minimum severity: critical, high, medium, low, info
`fail-on-findings`	`true`	Fail the action if findings meet severity threshold
`format`	`terminal`	Output format

Outputs: score (0–100), grade (A–F), total-findings, critical-count

The action writes a markdown job summary and emits GitHub annotations inline on affected files.

CLI Reference

agentshield scan [options]         Scan configuration directory
  -p, --path <path>                Path to scan (default: ~/.claude or cwd)
  -f, --format <format>            Output: terminal, json, markdown, html
  --fix                            Auto-apply safe fixes
  --opus                           Enable Opus 4.6 multi-agent analysis
  --stream                         Stream Opus analysis in real-time
  --min-severity <severity>        Filter: critical, high, medium, low, info
  -v, --verbose                    Show detailed output

agentshield init                   Generate secure baseline config

agentshield miniclaw start [opts]  Launch MiniClaw secure agent server
  -p, --port <port>                Port (default: 3847)
  -H, --hostname <host>            Hostname (default: localhost)
  --network <policy>               Network: none, localhost, allowlist
  --rate-limit <n>                 Max req/min per IP (default: 10)
  --sandbox-root <path>            Root path for sandboxes
  --max-duration <ms>              Max session duration (default: 300000)

Security Rules Summary

Category	Rules	Patterns	Severity Range
Secrets	10	14	Critical -- Medium
Permissions	10	--	Critical -- Medium
Hooks	34	--	Critical -- Low
MCP Servers	23	--	Critical -- Info
Agents	25	--	Critical -- Info
Total	102	14

Architecture

src/
├── index.ts              CLI entry point (commander)
├── action.ts             GitHub Action entry point
├── types.ts              Type system + Zod schemas
├── scanner/
│   ├── discovery.ts      Config file discovery
│   └── index.ts          Scan orchestrator
├── rules/
│   ├── index.ts          Rule registry
│   ├── secrets.ts        Secret detection (10 rules, 14 patterns)
│   ├── permissions.ts    Permission audit (10 rules)
│   ├── mcp.ts            MCP server security (23 rules)
│   ├── hooks.ts          Hook analysis (34 rules)
│   └── agents.ts         Agent config review (25 rules)
├── reporter/
│   ├── score.ts          Scoring engine (A-F grades)
│   ├── terminal.ts       Color terminal output
│   ├── json.ts           JSON + Markdown output
│   └── html.ts           Self-contained HTML report
├── fixer/
│   ├── transforms.ts     Fix transforms (secret, permission, generic)
│   └── index.ts          Fix engine orchestrator
├── init/
│   └── index.ts          Secure config generator
├── opus/
│   ├── prompts.ts        Attacker/Defender/Auditor system prompts
│   ├── pipeline.ts       Three-agent Opus 4.6 pipeline
│   └── render.ts         Opus analysis rendering
└── miniclaw/
    ├── types.ts          Core type system (immutable, readonly)
    ├── sandbox.ts        Sandbox lifecycle + path validation
    ├── router.ts         Prompt sanitization + output filtering
    ├── tools.ts          Whitelist-based tool authorization
    ├── server.ts         HTTP server with rate limiting + CORS
    ├── dashboard.tsx     React dashboard component
    └── index.ts          Entry point and re-exports

MiniClaw

MiniClaw is a minimal, sandboxed AI agent runtime bundled with AgentShield. Where typical agent platforms expose many attack surfaces (Telegram, Discord, email, community plugins), MiniClaw presents a single HTTP endpoint backed by an isolated sandbox.

# Start with secure defaults (localhost:3847, no network, safe tools only)
npx ecc-agentshield miniclaw start

# Custom configuration
npx ecc-agentshield miniclaw start --port 4000 --network localhost --rate-limit 20

Or use as a library:

import { startMiniClaw } from 'ecc-agentshield/miniclaw';

const { server, stop } = startMiniClaw();
// Listening on http://localhost:3847

Security Model

Four independently enforced layers:

Request → [Rate Limit] → [CORS] → [Size Cap] → [Sanitize Prompt]
                                                       ↓
                                                 [Tool Whitelist]
                                                       ↓
                                                   [Sandbox FS]
                                                       ↓
                                                 [Filter Output] → Response

Server — Rate limiting (10 req/min/IP), CORS, 10KB request cap, localhost-only binding
Prompt Router — Strips 12+ injection pattern categories (system prompt overrides, identity reassignment, jailbreaks, data exfiltration URLs, zero-width Unicode, base64 payloads)
Tool Whitelist — Three tiers: Safe (read/search/list), Guarded (write/edit), Restricted (bash/network — disabled by default)
Sandbox — Isolated filesystem per session, path traversal blocked, symlink escape detection, extension whitelist, 10MB file cap, 5-min timeout, no network by default

API

Method	Endpoint	Description
`POST`	`/api/prompt`	Send a prompt
`POST`	`/api/session`	Create a sandboxed session
`GET`	`/api/session`	Session info
`DELETE`	`/api/session/:id`	Destroy session + cleanup
`GET`	`/api/events/:sessionId`	Security audit events
`GET`	`/api/health`	Health check

MiniClaw has zero external runtime dependencies — Node.js built-ins only (http, fs, path, crypto). The optional React dashboard requires React 18+ as a peer dependency.

Development

npm install          # Install dependencies
npm run dev          # Development mode
npm test             # Run tests (912 tests)
npm run test:coverage # Coverage report
npm run typecheck    # Type check
npm run build        # Build
npm run scan:demo    # Demo scan against vulnerable examples

License

MIT

Built by @affaanmustafa · Part of Everything Claude Code

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
dist		dist
examples/vulnerable		examples/vulnerable
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentShield

Why

Quick Start

More commands

What It Catches

Secrets Detection (10 rules, 14 patterns)

Permission Audit (10 rules)

Hook Analysis (34 rules)

MCP Server Security (23 rules)

Agent Config Review (25 rules)

Features

Auto-Fix Engine (`--fix`)

Secure Init (`agentshield init`)

Opus 4.6 Deep Analysis (`--opus`)

Output Formats

GitHub Action

CLI Reference

Security Rules Summary

Architecture

MiniClaw

Security Model

API

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

affaan-m/agentshield

Folders and files

Latest commit

History

Repository files navigation

AgentShield

Why

Quick Start

More commands

What It Catches

Secrets Detection (10 rules, 14 patterns)

Permission Audit (10 rules)

Hook Analysis (34 rules)

MCP Server Security (23 rules)

Agent Config Review (25 rules)

Features

Auto-Fix Engine (--fix)

Secure Init (agentshield init)

Opus 4.6 Deep Analysis (--opus)

Output Formats

GitHub Action

CLI Reference

Security Rules Summary

Architecture

MiniClaw

Security Model

API

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Auto-Fix Engine (`--fix`)

Secure Init (`agentshield init`)

Opus 4.6 Deep Analysis (`--opus`)

Packages