Skip to content

OWASP Agentic AI Security Assessment -- Devika #700

@razashariff

Description

@razashariff

OWASP Agentic AI Top 10 -- Security Assessment

Hi team,

We conducted an OWASP Agentic AI Top 10 (2025) assessment of 39 popular AI agent frameworks as part of ongoing agentic security research. This assessment was performed via static analysis of public source code only -- no systems were accessed or tested remotely.


Assessment Results -- Devika

Check OWASP ID Severity Detail
Unsafe Execution AA-03 CRITICAL exec(), subprocess calls, shell command execution for code generation and testing
Inadequate Sandboxing AA-09 HIGH Code execution on host without process isolation or containerisation
Excessive Agency AA-01 MEDIUM Autonomous web browsing (Playwright), code generation, file creation, and execution

Risk Score: 65/100 (FAIL)


Why This Matters

Devika is a 19K+ star open-source AI software engineer (Devin alternative). The framework combines multiple high-risk patterns:

  • Code execution: Generates and executes code directly on the host via subprocess and exec()
  • Browser automation: Uses Playwright for autonomous web browsing, research, and data extraction
  • File system access: Creates, modifies, and deletes files on the host filesystem
  • Shell commands: Executes arbitrary shell commands for project setup, testing, and dependency installation
  • No sandboxing: All operations run directly on the host machine

The combination of code generation + execution + web browsing + filesystem access creates a broad attack surface. A prompt injection via a malicious webpage could lead to arbitrary code execution on the host.


Recommended Mitigations

  1. Containerised execution: Run generated code in Docker/sandbox rather than directly on host
  2. Network isolation: Separate browser automation from code execution environment
  3. File system restrictions: Limit writes to project directory, block access to sensitive paths
  4. Code review gate: Show generated code to user for approval before execution
  5. Browser sandboxing: Restrict Playwright to allowlisted domains

Context

This is not a vulnerability disclosure. This assessment maps publicly visible code patterns to the OWASP Agentic AI Top 10 framework.

Full results for all 39 frameworks: registry.agentsign.dev

Happy to discuss any of these findings.

Raza Sharif
Founder, CyberSecAI Ltd
agentsign.dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions