Update a trained model deployment Generally available; Added in 8.6.0

POST /_ml/trained_models/{model_id}/deployment/_update

Required authorization

  • Cluster privileges: manage_ml

Path parameters

  • model_id string Required

    The unique identifier of the trained model. Currently, only PyTorch models are supported.

Query parameters

  • number_of_allocations number

    The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.

application/json

Body

  • number_of_allocations number

    The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads. If adaptive_allocations is enabled, do not set this value, because it’s automatically set.

    Default value is 1.

  • adaptive_allocations object

    Adaptive allocations configuration. When enabled, the number of allocations is set based on the current load. If adaptive_allocations is enabled, do not set the number of allocations manually.

    Hide adaptive_allocations attributes Show adaptive_allocations attributes object
    • enabled boolean Required

      If true, adaptive_allocations is enabled

    • min_number_of_allocations number

      Specifies the minimum number of allocations to scale to. If set, it must be greater than or equal to 0. If not defined, the deployment scales to 0.

    • max_number_of_allocations number

      Specifies the maximum number of allocations to scale to. If set, it must be greater than or equal to min_number_of_allocations.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • assignment object Required
      Hide assignment attributes Show assignment attributes object
      • adaptive_allocations object | string | null

        One of:
      • assignment_state string Required

        The overall assignment state.

        Supported values include:

        • started: The deployment is usable; at least one node has the model allocated.
        • starting: The deployment has recently started but is not yet usable; the model is not allocated on any nodes.
        • stopping: The deployment is preparing to stop and deallocate the model from the relevant nodes.
        • failed: The deployment is on a failed state and must be re-deployed.

        Values are started, starting, stopping, or failed.

      • max_assigned_allocations number
      • reason string
      • routing_table object Required

        The allocation state for each node.

        Hide routing_table attribute Show routing_table attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • reason string

            The reason for the current state. It is usually populated only when the routing_state is failed.

          • routing_state string Required

            The current routing state.

            Supported values include:

            • failed: The allocation attempt failed.
            • started: The trained model is allocated and ready to accept inference requests.
            • starting: The trained model is attempting to allocate on this node; inference requests are not yet accepted.
            • stopped: The trained model is fully deallocated from this node.
            • stopping: The trained model is being deallocated from this node.

            Values are failed, started, starting, stopped, or stopping.

          • current_allocations number Required

            Current number of allocations.

          • target_allocations number Required

            Target number of allocations.

      • start_time string | number

        The timestamp when the deployment started.

        One of:

        The timestamp when the deployment started.

        The timestamp when the deployment started.

      • task_parameters object Required
        Hide task_parameters attributes Show task_parameters attributes object
        • model_bytes
        • model_id string Required

          The unique identifier for the trained model.

        • deployment_id string Required

          The unique identifier for the trained model deployment.

        • cache_size
        • number_of_allocations number Required

          The total number of allocations this model is assigned across ML nodes.

        • priority string Required

          Values are normal or low.

        • per_deployment_memory_bytes
        • per_allocation_memory_bytes
        • queue_capacity number Required

          Number of inference requests are allowed in the queue at a time.

        • threads_per_allocation number Required

          Number of threads per allocation.

POST /_ml/trained_models/{model_id}/deployment/_update
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update
{
  "number_of_allocations": 4
}
resp = client.ml.update_trained_model_deployment(
    model_id="elastic__distilbert-base-uncased-finetuned-conll03-english",
    number_of_allocations=4,
)
const response = await client.ml.updateTrainedModelDeployment({
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  number_of_allocations: 4,
});
response = client.ml.update_trained_model_deployment(
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  body: {
    "number_of_allocations": 4
  }
)
$resp = $client->ml()->updateTrainedModelDeployment([
    "model_id" => "elastic__distilbert-base-uncased-finetuned-conll03-english",
    "body" => [
        "number_of_allocations" => 4,
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"number_of_allocations":4}' "$ELASTICSEARCH_URL/_ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update"
client.ml().updateTrainedModelDeployment(u -> u
    .modelId("elastic__distilbert-base-uncased-finetuned-conll03-english")
    .numberOfAllocations(4)
);
Request example
An example body for a `POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update` request.
{
  "number_of_allocations": 4
}