Skip to content

Security auditor for AI agent configurations. Scans Claude Code setups for vulnerabilities, misconfigs, and injection risks.

License

Notifications You must be signed in to change notification settings

affaan-m/agentshield

Repository files navigation

AgentShield

Security auditor for AI agent configurations

Scans Claude Code setups for hardcoded secrets, permission misconfigs,
hook injection, MCP server risks, and agent prompt injection vectors.

npm version npm downloads tests coverage License: MIT

Quick Start · What It Catches · Opus Pipeline · GitHub Action · MiniClaw


Why

The AI agent ecosystem is growing faster than its security tooling. In January 2026 alone:

  • 12% of a major agent skill marketplace was malicious (341 of 2,857 community skills)
  • A CVSS 8.8 CVE exposed 17,500+ internet-facing instances to one-click RCE
  • The Moltbook breach compromised 1.5M API tokens across 770,000 agents

Developers install community skills, connect MCP servers, and configure hooks without any automated way to audit the security of their setup. AgentShield scans your .claude/ directory and flags vulnerabilities before they become exploits.

Built at the Claude Code Hackathon (Cerebral Valley x Anthropic, Feb 2026). Part of the Everything Claude Code ecosystem (42K+ stars).

Quick Start

# Scan your Claude Code config (no install required)
npx ecc-agentshield scan

# Or install globally
npm install -g ecc-agentshield
agentshield scan

That's it. AgentShield auto-discovers your ~/.claude/ directory, scans all config files, and prints a graded security report.

  AgentShield Security Report

  Grade: F (0/100)

  Score Breakdown
  Secrets        ░░░░░░░░░░░░░░░░░░░░ 0
  Permissions    ░░░░░░░░░░░░░░░░░░░░ 0
  Hooks          ░░░░░░░░░░░░░░░░░░░░ 0
  MCP Servers    ░░░░░░░░░░░░░░░░░░░░ 0
  Agents         ░░░░░░░░░░░░░░░░░░░░ 0

  ● CRITICAL  Hardcoded Anthropic API key
    CLAUDE.md:13
    Evidence: sk-ant-a...cdef
    Fix: Replace with environment variable reference [auto-fixable]

  ● CRITICAL  Overly permissive allow rule: Bash(*)
    settings.json
    Evidence: Bash(*)
    Fix: Restrict to specific commands: Bash(git *), Bash(npm *), Bash(node *)

  Summary
  Files scanned: 6
  Findings: 73 total — 19 critical, 29 high, 15 medium, 4 low, 6 info
  Auto-fixable: 8 (use --fix)

More commands

# Scan a specific directory
agentshield scan --path /path/to/.claude

# Auto-fix safe issues (replaces hardcoded secrets with env var references)
agentshield scan --fix

# JSON output for CI pipelines
agentshield scan --format json

# Generate an HTML security report
agentshield scan --format html > report.html

# Three-agent Opus 4.6 adversarial analysis (requires ANTHROPIC_API_KEY)
agentshield scan --opus --stream

# Generate a secure baseline config
agentshield init

What It Catches

102 rules across 5 categories, graded A–F with a 0–100 numeric score.

Secrets Detection (10 rules, 14 patterns)

What Examples
API keys Anthropic (sk-ant-), OpenAI (sk-proj-), AWS (AKIA), Google (AIza), Stripe (sk_test_/sk_live_)
Tokens GitHub PATs (ghp_/github_pat_), Slack (xox[bprs]-), JWTs (eyJ...), Bearer tokens
Credentials Hardcoded passwords, database connection strings (postgres/mongo/mysql/redis), private key material
Env leaks Secrets passed through environment variables in configs, echo $SECRET in hooks

Permission Audit (10 rules)

What Examples
Wildcard access Bash(*), Write(*), Edit(*) — unrestricted tool permissions
Missing deny lists No deny rules for rm -rf, sudo, chmod 777
Dangerous flags --dangerously-skip-permissions usage
Mutable tool exposure All mutable tools (Write, Edit, Bash) allowed without scoping
Destructive git git push --force, git reset --hard in allowed commands
Unrestricted network curl *, wget, ssh *, scp * in allow list without scope

Hook Analysis (34 rules)

What Examples
Command injection ${file} interpolation in shell commands — attacker-controlled filenames become code
Data exfiltration curl -X POST with variable interpolation sending data to external URLs
Silent errors 2>/dev/null, || true — failing security hooks that silently pass
Missing hooks No PreToolUse hooks, no Stop hooks for session-end validation
Network exposure Unthrottled network requests in hooks, sensitive file access without filtering
Session startup SessionStart hooks that download and execute remote scripts
Package installs Global npm install -g, pip install, gem install, cargo install in hooks
Container escape Docker --privileged, --pid=host, --network=host, root volume mounts
Credential access macOS Keychain, GNOME Keyring, /etc/shadow reads
Reverse shells /dev/tcp, mkfifo + nc, Python/Perl socket shells
Clipboard access pbcopy, xclip, xsel, wl-copy — exfiltration via clipboard
Log tampering journalctl --vacuum, rm /var/log, history -c — anti-forensics

MCP Server Security (23 rules)

What Examples
High-risk servers Shell/command MCPs, filesystem with root access, database MCPs, browser automation
Supply chain npx -y auto-install without confirmation — typosquatting vector
Hardcoded secrets API tokens in MCP environment config instead of env var references
Remote transport MCP servers connecting to remote URLs (SSE/streamable HTTP)
Shell metacharacters &&, |, ; in MCP server command arguments
Missing metadata No version pin, no description, excessive server count
Sensitive file args .env, .pem, credentials.json passed as server arguments
Network exposure Binding to 0.0.0.0 instead of localhost
Auto-approve autoApprove settings that skip user confirmation for tool calls
Missing timeouts High-risk servers without timeout — resource exhaustion risk

Agent Config Review (25 rules)

What Examples
Unrestricted tools Agents with Bash access, no allowedTools restriction
Prompt injection surface Agents processing external/user-provided content without defenses
Auto-run instructions CLAUDE.md containing "Always run", "without asking", "automatically install"
Hidden instructions Unicode zero-width characters, HTML comments, base64-encoded directives
URL execution CLAUDE.md instructing agents to fetch and execute remote URLs
Time bombs Delayed execution instructions triggered by time or absence conditions
Data harvesting Bulk collection of passwords, credentials, or database dumps
Prompt reflection ignore previous instructions, you are now, DAN jailbreak, fake system prompts
Output manipulation always report ok, remove warnings from output, suppress security findings

Features

Auto-Fix Engine (--fix)

Automatically applies safe fixes:

  • Replaces hardcoded secrets with ${ENV_VAR} references
  • Tightens wildcard permissions (Bash(*) → scoped Bash(git *), Bash(npm *))

Only fixes marked auto: true are applied. Permission changes require human review.

Secure Init (agentshield init)

Generates a hardened .claude/ directory with scoped permissions, safety hooks, and security best practices. Existing files are never overwritten.

Opus 4.6 Deep Analysis (--opus)

Three-agent adversarial pipeline powered by Claude Opus 4.6:

  1. Red Team (Attacker) — finds exploitable attack vectors and multi-step chains
  2. Blue Team (Defender) — evaluates existing protections and recommends hardening
  3. Auditor — synthesizes both perspectives into a prioritized risk assessment

The Attacker finds that curl hooks with ${file} interpolation + Bash(*) = command injection pivot. The Defender notes no PreToolUse hooks exist to stop it. The Auditor chains them into a prioritized action list.

agentshield scan --opus              # Red + Blue run in parallel
agentshield scan --opus --stream     # Sequential with real-time output
agentshield scan --opus --stream -v  # Verbose — see full agent reasoning
  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 1a: ATTACKER (Red Team)                       ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  ✓ Attacker analysis complete (4521 tokens)

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 1b: DEFENDER (Blue Team)                      ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  ✓ Defender analysis complete (3892 tokens)

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃  Phase 2: AUDITOR                                    ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  Risk Level: CRITICAL
  Opus Score: █████░░░░░░░░░░░░░░░ 15/100

Requires ANTHROPIC_API_KEY environment variable.

Output Formats

Format Flag Use Case
Terminal --format terminal (default) Interactive use
JSON --format json CI pipelines, programmatic access
Markdown --format markdown Documentation, PRs
HTML --format html Self-contained shareable report (dark theme, all CSS inlined)

GitHub Action

- name: AgentShield Security Scan
  uses: affaan-m/agentshield@v1
  with:
    path: "."
    min-severity: "medium"
    fail-on-findings: "true"

Inputs:

Input Default Description
path . Path to scan
min-severity medium Minimum severity: critical, high, medium, low, info
fail-on-findings true Fail the action if findings meet severity threshold
format terminal Output format

Outputs: score (0–100), grade (A–F), total-findings, critical-count

The action writes a markdown job summary and emits GitHub annotations inline on affected files.

CLI Reference

agentshield scan [options]         Scan configuration directory
  -p, --path <path>                Path to scan (default: ~/.claude or cwd)
  -f, --format <format>            Output: terminal, json, markdown, html
  --fix                            Auto-apply safe fixes
  --opus                           Enable Opus 4.6 multi-agent analysis
  --stream                         Stream Opus analysis in real-time
  --min-severity <severity>        Filter: critical, high, medium, low, info
  -v, --verbose                    Show detailed output

agentshield init                   Generate secure baseline config

agentshield miniclaw start [opts]  Launch MiniClaw secure agent server
  -p, --port <port>                Port (default: 3847)
  -H, --hostname <host>            Hostname (default: localhost)
  --network <policy>               Network: none, localhost, allowlist
  --rate-limit <n>                 Max req/min per IP (default: 10)
  --sandbox-root <path>            Root path for sandboxes
  --max-duration <ms>              Max session duration (default: 300000)

Security Rules Summary

Category Rules Patterns Severity Range
Secrets 10 14 Critical -- Medium
Permissions 10 -- Critical -- Medium
Hooks 34 -- Critical -- Low
MCP Servers 23 -- Critical -- Info
Agents 25 -- Critical -- Info
Total 102 14

Architecture

src/
├── index.ts              CLI entry point (commander)
├── action.ts             GitHub Action entry point
├── types.ts              Type system + Zod schemas
├── scanner/
│   ├── discovery.ts      Config file discovery
│   └── index.ts          Scan orchestrator
├── rules/
│   ├── index.ts          Rule registry
│   ├── secrets.ts        Secret detection (10 rules, 14 patterns)
│   ├── permissions.ts    Permission audit (10 rules)
│   ├── mcp.ts            MCP server security (23 rules)
│   ├── hooks.ts          Hook analysis (34 rules)
│   └── agents.ts         Agent config review (25 rules)
├── reporter/
│   ├── score.ts          Scoring engine (A-F grades)
│   ├── terminal.ts       Color terminal output
│   ├── json.ts           JSON + Markdown output
│   └── html.ts           Self-contained HTML report
├── fixer/
│   ├── transforms.ts     Fix transforms (secret, permission, generic)
│   └── index.ts          Fix engine orchestrator
├── init/
│   └── index.ts          Secure config generator
├── opus/
│   ├── prompts.ts        Attacker/Defender/Auditor system prompts
│   ├── pipeline.ts       Three-agent Opus 4.6 pipeline
│   └── render.ts         Opus analysis rendering
└── miniclaw/
    ├── types.ts          Core type system (immutable, readonly)
    ├── sandbox.ts        Sandbox lifecycle + path validation
    ├── router.ts         Prompt sanitization + output filtering
    ├── tools.ts          Whitelist-based tool authorization
    ├── server.ts         HTTP server with rate limiting + CORS
    ├── dashboard.tsx     React dashboard component
    └── index.ts          Entry point and re-exports

MiniClaw

MiniClaw is a minimal, sandboxed AI agent runtime bundled with AgentShield. Where typical agent platforms expose many attack surfaces (Telegram, Discord, email, community plugins), MiniClaw presents a single HTTP endpoint backed by an isolated sandbox.

# Start with secure defaults (localhost:3847, no network, safe tools only)
npx ecc-agentshield miniclaw start

# Custom configuration
npx ecc-agentshield miniclaw start --port 4000 --network localhost --rate-limit 20

Or use as a library:

import { startMiniClaw } from 'ecc-agentshield/miniclaw';

const { server, stop } = startMiniClaw();
// Listening on http://localhost:3847

Security Model

Four independently enforced layers:

Request → [Rate Limit] → [CORS] → [Size Cap] → [Sanitize Prompt]
                                                       ↓
                                                 [Tool Whitelist]
                                                       ↓
                                                   [Sandbox FS]
                                                       ↓
                                                 [Filter Output] → Response
  • Server — Rate limiting (10 req/min/IP), CORS, 10KB request cap, localhost-only binding
  • Prompt Router — Strips 12+ injection pattern categories (system prompt overrides, identity reassignment, jailbreaks, data exfiltration URLs, zero-width Unicode, base64 payloads)
  • Tool Whitelist — Three tiers: Safe (read/search/list), Guarded (write/edit), Restricted (bash/network — disabled by default)
  • Sandbox — Isolated filesystem per session, path traversal blocked, symlink escape detection, extension whitelist, 10MB file cap, 5-min timeout, no network by default

API

Method Endpoint Description
POST /api/prompt Send a prompt
POST /api/session Create a sandboxed session
GET /api/session Session info
DELETE /api/session/:id Destroy session + cleanup
GET /api/events/:sessionId Security audit events
GET /api/health Health check

MiniClaw has zero external runtime dependencies — Node.js built-ins only (http, fs, path, crypto). The optional React dashboard requires React 18+ as a peer dependency.

Development

npm install          # Install dependencies
npm run dev          # Development mode
npm test             # Run tests (912 tests)
npm run test:coverage # Coverage report
npm run typecheck    # Type check
npm run build        # Build
npm run scan:demo    # Demo scan against vulnerable examples

License

MIT


About

Security auditor for AI agent configurations. Scans Claude Code setups for vulnerabilities, misconfigs, and injection risks.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages