Skip to content

openai.BadRequestError with gpt-oss-120b - Expected 2 output messages (reasoning and final), but got 3. #1468

@02deno

Description

@02deno

Describe the bug

I get this error with gpt-oss-120b model: openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Expected 2 output messages (reasoning and final), but got 3.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Debug information

openai -> 1.99.1
openai-agents -> 0.2.5
python -> 3..11.9

This is my code:

import asyncio, json
from typing import Any, Dict, List, Optional
from dataclasses import dataclass, field

import httpx
import mlflow
from openai import AsyncOpenAI
from agents import Agent, OpenAIChatCompletionsModel, Runner, WebSearchTool, enable_verbose_stdout_logging, function_tool, set_default_openai_api, set_default_openai_client, set_trace_processors
from pydantic import BaseModel, Field
import requests

enable_verbose_stdout_logging()
set_trace_processors([])  # disable OpenAI tracing

# # Enable auto tracing for OpenAI Agents SDK
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("OpenAI Agent Plan")

class Step(BaseModel):
    agent: str
    input: Any = "last"   # could be "last" or an object

class Plan(BaseModel):
    stages: List[List[Step]] = Field(default_factory=list)
    target_language: Optional[str] = None

class WebSearchResult(BaseModel):
    result: str

local_client = AsyncOpenAI(
    base_url= gpt_url,
    api_key="local",
    http_client=httpx.AsyncClient(verify = False),
    timeout=10
)
set_default_openai_client(local_client)
set_default_openai_api("chat_completions")

model = OpenAIChatCompletionsModel(model=gpt_model_name, openai_client=local_client)


# Agent registry (add/remove freely) 
@dataclass
class AgentSpec:
    name: str
    model: str
    instructions: str
    capabilities: List[str]            # natural-language descriptors, e.g. ["retrieve fresh info from the web", "web search"]
    input_kind: str                    # "text", "list[doc]", "text+lang", etc.
    output_kind: str                  # "text", "list[doc]", etc.
    tools: list = field(default_factory=list)
    output_type: Any = str

    def build(self) -> Agent:
        return Agent(name=self.name, model=self.model, instructions=self.instructions, tools=self.tools, output_type=self.output_type)

REGISTRY: Dict[str, AgentSpec] = {}

def register(spec: AgentSpec): REGISTRY[spec.name] = spec
def remove(name: str): REGISTRY.pop(name, None)

@function_tool
def web_search_tool(query: str) -> str:
    # Simulate a deterministic result
    result = (
        "AI-First apps becoming default; rise of AI agents; "
        "multimodal reasoning in production; small specialized models; "
        "governance & evals as first-class."
    )
    return WebSearchResult(result=result)



# Register ONLY the two you want for now
register(AgentSpec(
    name="WebSearch",
    model=model,
    instructions=(
        "Call web_search_tool exactly once with the user query. "
        "Then RETURN ONLY the tool's string result as your final answer. "
        "Do not add any extra words, formatting, or analysis. Do not call any other tools."
    ),
    capabilities=[
        "find fresh information on the internet",
        "retrieve reports, news, rankings, analyses",
        "produce a combined string of web search results"
    ],
    input_kind="text",
    output_kind="text",
    output_type=WebSearchResult,
    tools=[web_search_tool],
))

register(AgentSpec(
    name="Translator",
    model=model,
    instructions="Translate given text into a specified target language. Preserve structure and bullets.",
    capabilities=[
        "translate text to a target language",
        "preserve formatting while translating"
    ],
    input_kind="text+target_language",
    output_kind="text"
))

# ── Neutral, capability-driven planner (no agent-specific rules) ────────────
PLANNER = Agent(
    name="Planner",
    model=model,
    instructions=(
        "You design a minimal pipeline from a live registry of agents.\n"
        "Inputs:\n"
        " - user_query: free text\n"
        " - target_language: optional language name (null if not requested)\n"
        " - registry: list of agents with fields {name, capabilities[], input_kind, output_kind}\n\n"
        "Constraints and objectives:\n"
        " - Choose steps whose capabilities semantically satisfy the user intent.\n"
        " - Ensure I/O compatibility between consecutive steps (output_kind → input_kind).\n"
        " - Prefer the fewest steps that satisfy the intent; parallelize only if it reduces total work.\n"
        " - If a step requires a parameter (e.g., target language), include it in the step input object.\n"
        " - Do NOT rely on specific agent names; select by capability semantics.\n"
        "Return ONLY valid JSON matching this schema: "
        '{"stages":[[{"agent":"<name>","input":"last" or object}]],"target_language":string|null}'
    ),
    output_type=Plan
)

async def run(agent: Agent, content: Any) -> Any:
    if not isinstance(content, (str, list)):
        content = json.dumps(content, ensure_ascii=False)
    stream = Runner.run_streamed(agent, content)
    async for _ in stream.stream_events(): pass
    return stream.final_output

async def plan_and_execute(user_query: str, target_language: Optional[str] = None):
    # Build real agents
    built = {name: spec.build() for name, spec in REGISTRY.items()}

    # Prepare registry view (names included for selection, but planner is told not to rely on them)
    registry_view = [{
        "name": spec.name,
        "capabilities": spec.capabilities,
        "input_kind": spec.input_kind,
        "output_kind": spec.output_kind
    } for spec in REGISTRY.values()]

    plan_input = {
        "user_query": user_query,
        "target_language": target_language,
        "registry": registry_view
    }
    plan_list = json.dumps(plan_input)
    plan_json = await run(PLANNER, plan_list)
    plan = json.loads(plan_json) if isinstance(plan_json, str) else plan_json

    last = None
    for stage in plan.stages:
        tasks = []
        for step in stage:
            name = step.agent
            payload = getattr(step, "input", "last")
            if payload == "last":
                payload = last
            # Small helper to pass target language when required by capability
            if name == "Translator" and isinstance(payload, dict) is False:
                payload = {"text": last if last is not None else user_query,
                           "target_language": target_language or "en"}
            tasks.append(run(built[name], payload))
        results = await asyncio.gather(*tasks)
        last = results if len(results) > 1 else results[0]

    return {"plan": plan, "result": last}

# Demo
if __name__ == "__main__":
    async def main():
        # Example 1: “find and summarize” – with only two agents, planner should pick WebSearch;
        # (No explicit summarizer registered yet.)
        q1 = "what are the most popular AI trends now? find them and give a concise output"
        o1 = await plan_and_execute(q1)
        print("\n=== PLAN 1 ==="); print(o1["plan"])
        print("\n=== RESULT 1 ==="); print(o1["result"])
        
    asyncio.run(main())

Example output of model:

{
	"id": "chatcmpl-f7af84495ecb48c59e02fd8d74056fda",
	"object": "chat.completion",
	"created": 1755085699,
	"model": "gpt-oss-120b",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "## Docker – an Overview\n\n**Docker** is an open‑source platform that makes it easy to build, ship, and run applications inside **containers**. A container bundles an application’s code together with everything it needs to run—runtime, system libraries, dependencies, and configuration—so it behaves the same way on any host that runs Docker.\n\n---\n\n### 1. Why Containers (and Docker)?\n\n| Feature | Virtual Machines (VMs) | Docker Containers |\n|---------|------------------------|-------------------|\n| **Isolation** | Full OS + hypervisor → heavyweight | Namespace + cgroups → lightweight |\n| **Boot time** | Seconds to minutes | Milliseconds |\n| **Size** | GBs (full OS) | MBs (just the app + libs) |\n| **Resource usage** | Separate kernel, duplicated OS services | Shares host kernel; only app-level processes |\n| **Portability** | Needs same hypervisor type (VMware, Hyper‑V, etc.) | Same Docker image runs on any Docker engine (Linux, macOS, Windows) |\n\nBecause containers are tiny, start fast, and are portable, they have become the de‑facto standard for:\n\n- **Micro‑service architectures**\n- **Continuous Integration / Continuous Deployment (CI/CD) pipelines**\n- **Development‑to‑production parity**\n- **Scaling applications in the cloud (Docker Swarm, Kubernetes)**\n\n---\n\n### 2. Core Docker Concepts\n\n| Term | Meaning |\n|------|----------|\n| **Docker Engine** | The daemon (`dockerd`) that runs on a host, exposing the Docker API. It manages images, containers, networks, and volumes. |\n| **Image** | A read‑only template that contains the file system and metadata needed to run a container. Built from a `Dockerfile`. |\n| **Container** | A runtime instance of an image, plus a thin writable layer on top. It has its own process space, network interfaces, and storage (volumes). |\n| **Dockerfile** | A simple, line‑by‑line script that tells Docker how to build an image (base image, copy files, run commands, expose ports, etc.). |\n| **Registry** | A place to store and share images. The public default is **Docker Hub** (`hub.docker.com`), but you can run private registries (e.g., `Harbor`, `Amazon ECR`, `GitHub Packages`). |\n| **Volume** | Persistent storage outside the container’s layered filesystem, useful for databases, logs, etc. |\n| **Network** | Docker creates isolated networks (`bridge`, `host`, `overlay`, `macvlan`) allowing containers to communicate securely. |\n| **Compose** | A YAML‑based tool (`docker-compose`) to define multi‑container applications (services, networks, volumes) and bring them up with a single command. |\n| **Swarm / Kubernetes** | Orchestrators that manage many Docker hosts (clusters), handling scheduling, scaling, load‑balancing, and self‑healing. |\n\n---\n\n### 3. Typical Docker Workflow\n\n1. **Write a Dockerfile** for your app.  \n2. **Build** an image: `docker build -t myapp:1.0 .`  \n3. **Test** locally: `docker run -p 8080:80 myapp:1.0`  \n4. **Push** to a registry: `docker push myrepo/myapp:1.0`  \n5. **Deploy** on a server, swarm, or Kubernetes cluster.\n\n---\n\n### 4. Simple Example\n\n#### a. Sample `Dockerfile` (Node.js app)\n\n```dockerfile\n# Use an official lightweight Node runtime as the base image\nFROM node:20-alpine\n\n# Set working directory inside the container\nWORKDIR /usr/src/app\n\n# Install app dependencies (only copy package.json + lock first for caching)\nCOPY package*.json ./\nRUN npm ci --only=production\n\n# Copy the rest of the application source code\nCOPY . .\n\n# Expose the port the App runs on\nEXPOSE 3000\n\n# Define the command to run the app\nCMD [\"node\", \"index.js\"]\n```\n\n#### b. Build & Run\n\n```bash\n# Build the image (tag it as mynodeapp:latest)\ndocker build -t mynodeapp:latest .\n\n# Run a container (map host port 3000 → container port 3000)\ndocker run -d -p 3000:3000 --name mynodeapp_container mynodeapp:latest\n```\n\nYou can now open `http://localhost:3000` in a browser and see the app.\n\n---\n\n### 5. Docker Commands Cheat‑Sheet\n\n| Command | Purpose |\n|---------|---------|\n| `docker version` | Show Docker client & daemon version |\n| `docker info` | Display system‑wide info (running containers, storage driver, etc.) |\n| `docker pull <image>` | Download an image from a registry |\n| `docker push <image>` | Upload an image to a registry |\n| `docker images` | List local images |\n| `docker rmi <image>` | Remove an image |\n| `docker run [options] <image> [cmd]` | Create & start a container |\n| `docker ps` | List running containers |\n| `docker ps -a` | List all containers (including stopped) |\n| `docker exec -it <container> /bin/sh` | Open an interactive shell inside a running container |\n| `docker logs <container>` | View container logs |\n| `docker stop/start/restart <container>` | Control container lifecycle |\n| `docker rm <container>` | Delete a stopped container |\n| `docker volume ls` | List volumes |\n| `docker network ls` | List networks |\n| `docker compose up -d` | Bring up a multi‑service app defined in `docker-compose.yml` |\n| `docker compose down` | Stop and remove the compose stack |\n\n---\n\n### 6. Docker vs. Alternatives\n\n| Feature | Docker | Podman (Red Hat) | LXC/LXD | rkt (CoreOS, now deprecated) |\n|---------|--------|------------------|----------|------------------------------|\n| Daemonless | No (needs `dockerd`) | Yes (daemon‑less) | Yes (system‑level) | No |\n| Compatibility with Docker CLI | ✔︎ | ✔︎ (Docker‑compatible) | ❌ | ❌ |\n| Rootless mode | ✔︎ (since 19.03) | ✔︎ (native) | ✔︎ | ✔︎ |\n| Integration with Kubernetes | Native (`dockershim` now deprecated, but images work) | Works as OCI runtime | Not typical | Not typical |\n| Security model | Namespaces + cgroups, AppArmor/SELinux profiles | Similar but more granular defaults | Uses system containers + security policies | N/A |\n\nDocker remains the most widely used container engine, largely because of its ecosystem, tooling, and community support.\n\n---\n\n### 7. Best Practices\n\n| Area | Recommendation |\n|------|----------------|\n| **Image size** | Use minimal base images (`alpine`, `scratch`) when possible. Combine `RUN` steps to reduce layers. |\n| **Caching** | Order Dockerfile commands from least‑to‑most‑changing (e.g., install dependencies before copying source code). |\n| **Security** | Run processes as non‑root (`USER appuser`). Scan images (`docker scan` or tools like Trivy, Clair). Enable **Docker Content Trust** (`DOCKER_CONTENT_TRUST=1`). |\n| **Secrets** | Don’t bake passwords/keys into images. Use Docker secrets (Swarm) or external secret managers (Vault, AWS Secrets Manager). |\n| **Logging** | Write to `stdout`/`stderr`; let the host capture logs via `docker logs` or a logging driver (json‑file, syslog, fluentd). |\n| **Health checks** | Add `HEALTHCHECK` instructions in Dockerfile so orchestrators can schedule restarts. |\n| **Multi‑stage builds** | Use a builder image to compile code, then copy the final artifact into a lighter runtime image. |\n| **Version pinning** | Explicitly tag images (`node:20-alpine`) rather than using `latest` in production. |\n| **Resource limits** | Set CPU & memory limits (`--cpus`, `--memory`) especially in shared environments. |\n\n---\n\n### 8. Security Model – Quick Dive\n\n1. **Kernel Namespaces**  \n   - PID, NET, IPC, MNT, UTS, USER: isolate process IDs, network stacks, inter‑process communication, mount points, hostname, and user IDs.\n\n2. **cgroups (Control Groups)**  \n   - Enforce resource quotas (CPU, memory, blk‑io, etc.) per container.\n\n3. **Capabilities**  \n   - Linux capabilities are split; containers run without `CAP_SYS_ADMIN` by default.\n\n4. **AppArmor/SELinux Profiles**  \n   - Apply additional mandatory access controls; e.g., Docker’s default `docker-default` profile.\n\n5. **Rootless Docker**  \n   - Starts the daemon as a regular user, reducing the attack surface. Works on most modern kernels (user namespaces).\n\n6. **Image Signing (Notary / Docker Content Trust)**  \n   - Cryptographically sign images to guarantee integrity and provenance.\n\n7. **Runtime Security Tools**  \n   - Falco, Sysdig, Aqua, Twistlock/Prisma Cloud, or open‑source tools like Trivy for vulnerability scanning.\n\n---\n\n### 9. Docker in the Cloud\n\n| Cloud Provider | Docker‑related Offerings |\n|----------------|---------------------------|\n| **AWS** | Amazon Elastic Container Service (ECS) – runs Docker containers; Amazon Elastic Kubernetes Service (EKS) – Kubernetes using Docker images; Elastic Container Registry (ECR). |\n| **Azure** | Azure Container Instances (ACI); Azure Kubernetes Service (AKS); Azure Container Registry (ACR). |\n| **Google Cloud** | Google Kubernetes Engine (GKE); Cloud Run (fully managed Knative, runs containers). |\n| **DigitalOcean** | App Platform (builds from Dockerfiles) and Managed Kubernetes. |\n| **GitHub** | GitHub Packages (container registry) + GitHub Actions (build/push Docker images). |\n\nMost CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, CircleCI, Azure Pipelines, etc.) have first‑class support for Docker.\n\n---\n\n### 10. Common Misconceptions\n\n| Myth | Reality |\n|------|----------|\n| *Docker = Virtual Machine* | Docker uses OS‑level isolation, not full hardware virtualization. |\n| *Docker containers are completely secure* | They isolate processes but share the kernel; misconfiguration can lead to breakout. Use rootless mode & security best‑practices. |\n| *Docker replaces the need for configuration management* | Docker standardizes the runtime environment, but you still need tools (Ansible, Terraform) for provisioning hosts, networking, and storage. |\n| *All images are the same* | Images vary dramatically in size, base layers, and security posture. Choose minimal, vetted bases and scan them. |\n| *Docker is only for developers* | Ops, SRE, data‑science and security teams all leverage Docker for reproducible environments, testing, and deployment. |\n\n---\n\n### 11. Getting Started – Quick Checklist\n\n1. **Install Docker Engine**  \n   - Linux: `apt-get install docker.io` (Ubuntu), `dnf install docker` (Fedora) or use the official Docker CE packages.  \n   - macOS / Windows: Download **Docker Desktop** (includes Docker Engine, Compose, and a UI).  \n\n2. **Verify**  \n   ```bash\n   docker run hello-world\n   ```\n   You should see a “Hello from Docker!” message.\n\n3. **Create a simple app** (e.g., a static `index.html` + Nginx) and write a Dockerfile.  \n\n4. **Build & Run** using the commands above.  \n\n5. **Explore**: `docker ps`, `docker logs`, `docker exec -it <id> sh`.  \n\n6. **Persist data**: Create a volume (`docker volume create mydata`) and mount it (`-v mydata:/data`).  \n\n7. **Compose**: Write a `docker-compose.yml` for multi‑service apps (e.g., web + db).  \n\n8. **Push to a registry** (Docker Hub or a private registry).  \n\n9. **Deploy** to a cloud service or a local Swarm/Kubernetes cluster.\n\n---\n\n### 12. Further Learning Resources\n\n| Format | Link |\n|--------|------|\n| **Official Docs** | <https://docs.docker.com/> |\n| **Dockerfile Reference** | <https://docs.docker.com/engine/reference/builder/> |\n| **Docker Compose** | <https://docs.docker.com/compose/> |\n| **Docker Hub** | <https://hub.docker.com/> |\n| **Free Interactive Tutorial** | <https://labs.play-with-docker.com/> |\n| **Books** | *Docker Deep Dive* (Nigel Poulton), *The Docker Book* (James Turnbull) |\n| **Videos** | Docker Meetups on YouTube, “Docker for Beginners” series by Bret Fisher |\n| **Security Scanning** | Trivy: <https://github.com/aquasecurity/trivy> |\n| **Community** | Docker Community Forums, Stack Overflow tag `docker`, r/docker on Reddit |\n\n---\n\n## TL;DR\n\nDocker is a lightweight container platform that packages an application and its dependencies into a portable, isolated image that runs as a container on any host with the Docker Engine. It brings VM‑like isolation with far less overhead, enabling rapid development, consistent environments, and scalable deployment pipelines. Master the core concepts (image, container, Dockerfile, registry, compose), follow best‑practice patterns (small base images, multi‑stage builds, security hardening), and you’ll be able to ship software reliably from a laptop all the way to production clouds.  \n\nIf you have a specific language, framework, or deployment scenario you’d like to explore in detail, let me know and I can walk you through a tailored Dockerfile and workflow!",
				"refusal": null,
				"annotations": null,
				"audio": null,
				"function_call": null,
				"tool_calls": [],
				"reasoning_content": "User asks: \"what is docker?\" Likely expects an explanation. Should define Docker, its purpose, containerization, architecture, components, use cases, benefits, differences from VM, basic commands, sample Dockerfile, popular images, best practices, security, etc. Could ask if they'd like deeper dive. Answer in straightforward manner. Provide resources. Possibly add code examples. Also potential follow-up: ask which OS they use, etc. Write comprehensive answer. Since they asked simply, provide overview. Should be concise but thorough. Probably no need to ask clarifying question unless required. Provide answer."
			},
			"logprobs": null,
			"finish_reason": "stop",
			"stop_reason": null
		}
	],
	"service_tier": null,
	"system_fingerprint": null,
	"usage": {
		"prompt_tokens": 75,
		"total_tokens": 3192,
		"completion_tokens": 3117,
		"prompt_tokens_details": null
	},
	"prompt_logprobs": null,
	"kv_transfer_params": null
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions