Skip to content

[Threat] Workspace files injected into system prompt create credential storage honeypot #1

@zeroaltitude

Description

@zeroaltitude

Attack Scenario

OpenClaw loads user-editable workspace files (TOOLS.md, AGENTS.md, SOUL.md, USER.md, HEARTBEAT.md, IDENTITY.md) into the system prompt every turn. TOOLS.md is explicitly described as a place for "environment-specific" notes — camera names, SSH hosts, voice preferences, etc.

In practice, users naturally store operational secrets here too: API keys, database passwords, keyring passwords, SSH passphrases. It's the obvious place to put "things my agent needs to know." The default TOOLS.md template even has a "Secrets & Auth" section heading in some configurations.

Once a secret is in a workspace file, it's in the system prompt on every single LLM API call — every user message, every heartbeat poll (~30 min cycles), every sub-agent spawn. This means:

  1. Prompt injection extracts credentials, not just instructions. A successful direct or indirect injection (T-EXEC-001, T-EXEC-002) that gets the agent to disclose its system prompt now yields operational secrets — SSH passphrases, API keys, database credentials — not just the agent's personality and rules.

  2. The exposure is continuous. Secrets are sent to the LLM API hundreds of times per day via heartbeats alone. They appear in API logs, session transcripts, and any monitoring/debugging output.

  3. The platform's UX encourages this pattern. TOOLS.md is documented as the right place for operational specifics. There's no warning that its contents become part of every API call, and no detection for accidentally stored secrets.

Affected Components

  • Workspace file injection into system prompt (TOOLS.md, AGENTS.md, etc.)
  • Heartbeat system (amplifies exposure frequency)
  • Session transcripts (persists secrets to disk)
  • All channel integrations (injection vectors for extraction)

Severity

High. The underlying vulnerability (system prompt extraction) is rated Medium in the current threat model, but the practical impact is elevated because:

  • The platform actively encourages storing secrets in the affected files
  • Compromise yields operational credentials with real-world access, not just agent instructions
  • Exposure is continuous and amplified by heartbeats

Evidence

We reproduced this in a real deployment. TOOLS.md contained a plaintext keyring password loaded into the system prompt on every agent turn. The fix was to remove all secrets from workspace files and fetch them at runtime from a secret manager (1Password CLI via service account).

Suggested Mitigations

Quick wins:

  • Add explicit warnings in the TOOLS.md template: "Never store secrets here — contents are sent to the LLM on every API call"
  • Document the recommended pattern: use a secret manager and fetch credentials at runtime via tool calls

Detection:

  • Scan workspace files for high-entropy strings or known credential patterns (API key regexes like sk-ant-, sk-proj-, xoxb-, AKIA, etc.) before injecting into system prompt
  • Warn users if potential secrets are detected

Architecture:

  • Support op://, env://, or secret:// references in workspace files that resolve at runtime rather than being embedded in the prompt
  • PR #9271 (zero-trust secure gateway) partially addresses this by isolating the agent in a container with placeholder credentials

Related Threats

  • Amplifies T-DISC-003 (System Prompt Extraction) — extraction now yields credentials
  • Amplifies T-EXEC-001 / T-EXEC-002 (Prompt Injection) — injection payloads become credential harvesting
  • Enables T-EXFIL-001 (Data Theft via web_fetch) and T-EXFIL-003 (Credential Harvesting)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions