-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
OWASP Agentic AI Top 10 -- Security Assessment
Hi team,
We conducted an OWASP Agentic AI Top 10 (2025) assessment of 39 popular AI agent frameworks as part of ongoing agentic security research. This assessment was performed via static analysis of public source code only -- no systems were accessed or tested remotely.
Assessment Results -- Devika
| Check | OWASP ID | Severity | Detail |
|---|---|---|---|
| Unsafe Execution | AA-03 | CRITICAL | exec(), subprocess calls, shell command execution for code generation and testing |
| Inadequate Sandboxing | AA-09 | HIGH | Code execution on host without process isolation or containerisation |
| Excessive Agency | AA-01 | MEDIUM | Autonomous web browsing (Playwright), code generation, file creation, and execution |
Risk Score: 65/100 (FAIL)
Why This Matters
Devika is a 19K+ star open-source AI software engineer (Devin alternative). The framework combines multiple high-risk patterns:
- Code execution: Generates and executes code directly on the host via subprocess and exec()
- Browser automation: Uses Playwright for autonomous web browsing, research, and data extraction
- File system access: Creates, modifies, and deletes files on the host filesystem
- Shell commands: Executes arbitrary shell commands for project setup, testing, and dependency installation
- No sandboxing: All operations run directly on the host machine
The combination of code generation + execution + web browsing + filesystem access creates a broad attack surface. A prompt injection via a malicious webpage could lead to arbitrary code execution on the host.
Recommended Mitigations
- Containerised execution: Run generated code in Docker/sandbox rather than directly on host
- Network isolation: Separate browser automation from code execution environment
- File system restrictions: Limit writes to project directory, block access to sensitive paths
- Code review gate: Show generated code to user for approval before execution
- Browser sandboxing: Restrict Playwright to allowlisted domains
Context
This is not a vulnerability disclosure. This assessment maps publicly visible code patterns to the OWASP Agentic AI Top 10 framework.
Full results for all 39 frameworks: registry.agentsign.dev
Happy to discuss any of these findings.
Raza Sharif
Founder, CyberSecAI Ltd
agentsign.dev