This page explains how to implement runtime enforcement in real systems: where to place validation boundaries, how frequently to validate, and how to handle failures without turning enforcement into best-effort policy. It is operational guidance that assumes the guarantees are already understood.
Everything reduces to a single invariant:
Register. Validate. Work.
If validation fails, work does not begin.
Your only job as an implementer is to choose and enforce where validation boundaries exist. If the boundary is vague, enforcement becomes vague.
Validation belongs at execution boundaries—points where work begins or commits side effects. Most systems already have these boundaries; they are simply not treated as enforcement points.
- Startup boundary: before an instance begins processing work
- Task boundary: before each unit of work (job, message, task, tool call)
- Side-effect boundary: before committing external actions (writes, payments, emails)
Pattern A: Startup gate (required)
Validate once at startup. If allowed is false, do not start the loop. Exit immediately.
register(device_id)
val = validate(device_id)
if not val.allowed:
exit(1)
start_work_loop()
Pattern B: Per-unit-of-work gate (recommended)
Validate before each job/message/task. This makes revocation effective at predictable points.
while True:
job = dequeue()
val = validate(device_id)
if not val.allowed:
exit(1)
process(job)
Pattern C: Side-effect gate (high leverage)
Validate immediately before irreversible actions: writes, transfers, external API calls, outbound messages.
val = validate(device_id)
if not val.allowed:
exit(1)
commit_side_effect()
MachineID does not introspect internal loops. If a process runs for hours, you must define enforcement boundaries inside the loop.
Good boundaries are:
- Before each iteration that triggers a tool call
- Before each external request that can incur cost
- Before each write or irreversible side effect
- Before entering a high-cost sub-loop (batch runs, fan-out, recursion)
Treat validation like a safety-critical call: it must be fast, and it must fail safely.
If your runtime cannot tolerate fail-closed for certain workloads, you are describing a different system: one that accepts best-effort enforcement. That is explicitly outside the guarantees.
Decide your failure policy up front. Do not allow “it depends” behavior per team, per service, or per environment. Enforcement must be consistent.
Recommended policy: fail closed
- If validate fails (timeout/network): treat as
allowed:false - Exit the process or stop the worker loop
- Surface the failure via logs/alerts
The guarantee is binary. Avoid patterns like:
- “Proceed anyway but log a warning”
- “Continue for 10 minutes until checks recover”
- “Fallback to internal flags when validation is down”
Those patterns create a second authority path inside the runtime. That is precisely what an external control plane avoids.
MachineID enforces permission on an identity (a “device”) representing an execution surface. This identity should map to a specific runtime instance or logical worker identity—not a whole service.
- Good: one agent instance = one device
- Good: one worker replica = one device
- Risky: one entire cluster = one device (loses surgical control)
Device IDs should be stable enough to audit, but specific enough to revoke. A common pattern is:
{service}:{env}:{role}:{instance}
Examples:
agent:prod:planner:01worker:prod:queue-consumer:07job:prod:nightly-reindex:01
Choose validation frequency based on risk:
- Low risk: validate at startup, then at task boundaries
- Medium risk: validate at startup and per unit of work
- High risk: validate at startup + per unit of work + before side effects
Enforcement without observability is operationally painful. At minimum, log:
- Device ID
- Validation result (
allowed) - Reason/error (when denied)
- Boundary type (startup / task / side-effect)
Keep logs structured. Treat enforcement denials as first-class operational events.
Validate at startup, then before each tool call or task step. If denied, exit immediately.
register(agent_id)
if not validate(agent_id).allowed:
exit(1)
for step in plan:
if not validate(agent_id).allowed:
exit(1)
run_step(step)
Validate before pulling work. If denied, stop consuming.
register(worker_id)
if not validate(worker_id).allowed:
exit(1)
while True:
if not validate(worker_id).allowed:
exit(1)
job = dequeue()
process(job)
Validate at job start. If denied, exit before doing any work.
register(job_id)
if not validate(job_id).allowed:
exit(1)
run_job()
Validate before consuming or before handling each message, depending on risk and throughput.
register(consumer_id)
if not validate(consumer_id).allowed:
exit(1)
while True:
msg = read_message()
if not validate(consumer_id).allowed:
exit(1)
handle(msg)
Validate before triggering any downstream side effects.
register(handler_id)
if not validate(handler_id).allowed:
return 403
commit_side_effect()
- Every execution surface has a stable device identity
- Startup validation is enforced and fails closed
- Boundaries are explicit (task / side-effect)
- Revocation becomes effective at predictable checkpoints
- Timeouts are short and consistent
- Denials are logged and actionable