OWASP Agentic AI Security Assessment -- Devika

### OWASP Agentic AI Top 10 -- Security Assessment

Hi team,

We conducted an OWASP Agentic AI Top 10 (2025) assessment of 39 popular AI agent frameworks as part of ongoing agentic security research. This assessment was performed via **static analysis of public source code only** -- no systems were accessed or tested remotely.

---

#### Assessment Results -- Devika

| Check | OWASP ID | Severity | Detail |
|-------|----------|----------|--------|
| Unsafe Execution | AA-03 | CRITICAL | exec(), subprocess calls, shell command execution for code generation and testing |
| Inadequate Sandboxing | AA-09 | HIGH | Code execution on host without process isolation or containerisation |
| Excessive Agency | AA-01 | MEDIUM | Autonomous web browsing (Playwright), code generation, file creation, and execution |

**Risk Score: 65/100 (FAIL)**

---

#### Why This Matters

Devika is a 19K+ star open-source AI software engineer (Devin alternative). The framework combines multiple high-risk patterns:

- **Code execution**: Generates and executes code directly on the host via subprocess and exec()
- **Browser automation**: Uses Playwright for autonomous web browsing, research, and data extraction
- **File system access**: Creates, modifies, and deletes files on the host filesystem
- **Shell commands**: Executes arbitrary shell commands for project setup, testing, and dependency installation
- **No sandboxing**: All operations run directly on the host machine

The combination of code generation + execution + web browsing + filesystem access creates a broad attack surface. A prompt injection via a malicious webpage could lead to arbitrary code execution on the host.

---

#### Recommended Mitigations

1. **Containerised execution**: Run generated code in Docker/sandbox rather than directly on host
2. **Network isolation**: Separate browser automation from code execution environment
3. **File system restrictions**: Limit writes to project directory, block access to sensitive paths
4. **Code review gate**: Show generated code to user for approval before execution
5. **Browser sandboxing**: Restrict Playwright to allowlisted domains

---

#### Context

This is not a vulnerability disclosure. This assessment maps publicly visible code patterns to the [OWASP Agentic AI Top 10](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) framework.

Full results for all 39 frameworks: [registry.agentsign.dev](https://registry.agentsign.dev)

Happy to discuss any of these findings.

**Raza Sharif**
Founder, CyberSecAI Ltd
[agentsign.dev](https://agentsign.dev)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OWASP Agentic AI Security Assessment -- Devika #700

OWASP Agentic AI Top 10 -- Security Assessment

Assessment Results -- Devika

Why This Matters

Recommended Mitigations

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Check	OWASP ID	Severity	Detail
Unsafe Execution	AA-03	CRITICAL	exec(), subprocess calls, shell command execution for code generation and testing
Inadequate Sandboxing	AA-09	HIGH	Code execution on host without process isolation or containerisation
Excessive Agency	AA-01	MEDIUM	Autonomous web browsing (Playwright), code generation, file creation, and execution

OWASP Agentic AI Security Assessment -- Devika #700

Description

OWASP Agentic AI Top 10 -- Security Assessment

Assessment Results -- Devika

Why This Matters

Recommended Mitigations

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions