Perform chat completion inference Generally available; Added in 8.18.0

POST /_inference/chat_completion/{inference_id}/_stream

The chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the chat_completion task type for openai and elastic inference services.

NOTE: The chat_completion task type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. If you use the openai, hugging_face or the elastic service, use the Chat completion inference API.

Path parameters

  • inference_id string Required

    The inference Id

Query parameters

  • timeout string

    Specifies the amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

Body Required

  • messages array[object] Required

    A list of objects representing the conversation. Requests should generally only add new messages from the user (role user). The other message roles (assistant, system, or tool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.

    An object representing part of the conversation.

    Hide messages attributes Show messages attributes object
    • content string | array[object]

      The content of the message.

      String example:

      {
         "content": "Some string"
      }
      

      Object example:

      {
        "content": [
            {
             "text": "Some text",
             "type": "text"
            }
         ]
      }
      
      One of:

      The content of the message.

      String example:

      {
         "content": "Some string"
      }
      

      Object example:

      {
        "content": [
            {
             "text": "Some text",
             "type": "text"
            }
         ]
      }
      

      The content of the message.

      String example:

      {
         "content": "Some string"
      }
      

      Object example:

      {
        "content": [
            {
             "text": "Some text",
             "type": "text"
            }
         ]
      }
      
    • role string Required

      The role of the message author. Valid values are user, assistant, system, and tool.

    • tool_call_id string

      Only for tool role messages. The tool call that this message is responding to.

    • tool_calls array[object]

      Only for assistant role messages. The tool calls generated by the model. If it's specified, the content field is optional. Example:

      {
        "tool_calls": [
            {
                "id": "call_KcAjWtAww20AihPHphUh46Gd",
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "arguments": "{\"location\":\"Boston, MA\"}"
                }
            }
        ]
      }
      

      A tool call generated by the model.

      Hide tool_calls attributes Show tool_calls attributes object
      • id string Required

        The identifier of the tool call.

      • function object Required

        The function that the model called.

        Hide function attributes Show function attributes object
        • arguments string Required

          The arguments to call the function with in JSON format.

        • name string Required

          The name of the function to call.

      • type string Required

        The type of the tool call.

  • model string

    The ID of the model to use.

  • max_completion_tokens number

    The upper bound limit for the number of tokens that can be generated for a completion request.

  • stop array[string]

    A sequence of strings to control when the model should stop generating additional tokens.

  • temperature number

    The sampling temperature to use.

  • tool_choice string | object

    Controls which tool is called by the model. String representation: One of auto, none, or requrired. auto allows the model to choose between calling tools and generating a message. none causes the model to not call any tools. required forces the model to call one or more tools. Example (object representation):

    {
      "tool_choice": {
          "type": "function",
          "function": {
              "name": "get_current_weather"
          }
      }
    }
    
    One of:

    Controls which tool is called by the model. String representation: One of auto, none, or requrired. auto allows the model to choose between calling tools and generating a message. none causes the model to not call any tools. required forces the model to call one or more tools. Example (object representation):

    {
      "tool_choice": {
          "type": "function",
          "function": {
              "name": "get_current_weather"
          }
      }
    }
    
  • tools array[object]

    A list of tools that the model can call. Example:

    {
      "tools": [
          {
              "type": "function",
              "function": {
                  "name": "get_price_of_item",
                  "description": "Get the current price of an item",
                  "parameters": {
                      "type": "object",
                      "properties": {
                          "item": {
                              "id": "12345"
                          },
                          "unit": {
                              "type": "currency"
                          }
                      }
                  }
              }
          }
      ]
    }
    

    A list of tools that the model can call.

    Hide tools attributes Show tools attributes object
    • type string Required

      The type of tool.

    • function object Required

      The function definition.

      Hide function attributes Show function attributes object
      • description string

        A description of what the function does. This is used by the model to choose when and how to call the function.

      • name string Required

        The name of the function.

      • parameters object

        The parameters the functional accepts. This should be formatted as a JSON object.

      • strict boolean

        Whether to enable schema adherence when generating the function call.

  • top_p number

    Nucleus sampling, an alternative to sampling with temperature.

Responses

  • 200 application/json
POST /_inference/chat_completion/{inference_id}/_stream
POST _inference/chat_completion/openai-completion/_stream
{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}
resp = client.inference.chat_completion_unified(
    inference_id="openai-completion",
    chat_completion_request={
        "model": "gpt-4o",
        "messages": [
            {
                "role": "user",
                "content": "What is Elastic?"
            }
        ]
    },
)
const response = await client.inference.chatCompletionUnified({
  inference_id: "openai-completion",
  chat_completion_request: {
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: "What is Elastic?",
      },
    ],
  },
});
response = client.inference.chat_completion_unified(
  inference_id: "openai-completion",
  body: {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is Elastic?"
      }
    ]
  }
)
$resp = $client->inference()->chatCompletionUnified([
    "inference_id" => "openai-completion",
    "body" => [
        "model" => "gpt-4o",
        "messages" => array(
            [
                "role" => "user",
                "content" => "What is Elastic?",
            ],
        ),
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What is Elastic?"}]}' "$ELASTICSEARCH_URL/_inference/chat_completion/openai-completion/_stream"
client.inference().chatCompletionUnified(c -> c
    .inferenceId("openai-completion")
    .chatCompletionRequest(ch -> ch
        .messages(m -> m
            .content(co -> co
                .string("What is Elastic?")
            )
            .role("user")
        )
        .model("gpt-4o")
    )
);
Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion on the example question with streaming.
{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}
Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using an Assistant message with `tool_calls`.
{
  "messages": [
      {
          "role": "assistant",
          "content": "Let's find out what the weather is",
          "tool_calls": [ 
              {
                  "id": "call_KcAjWtAww20AihPHphUh46Gd",
                  "type": "function",
                  "function": {
                      "name": "get_current_weather",
                      "arguments": "{\"location\":\"Boston, MA\"}"
                  }
              }
          ]
      },
      { 
          "role": "tool",
          "content": "The weather is cold",
          "tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
      }
  ]
}
Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using a User message with `tools` and `tool_choice`.
{
  "messages": [
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "What's the price of a scarf?"
              }
          ]
      }
  ],
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_current_price",
              "description": "Get the current price of a item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "123"
                      }
                  }
              }
          }
      }
  ],
  "tool_choice": {
      "type": "function",
      "function": {
          "name": "get_current_price"
      }
  }
}
Response examples (200)
A successful response when performing a chat completion task using a User message with `tools` and `tool_choice`.
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

(...)

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}} 

event: message
data: [DONE]