Tune Gemini models by using supervised fine-tuning

This page shows you how to tune a Gemini model by using supervised fine-tuning and covers the following topics:

The following diagram summarizes the overall workflow:

Before you begin

Before you can tune a model, you need to prepare a supervised fine-tuning dataset. The dataset requirements depend on your use case.

Supported models

The following Gemini models support supervised tuning:

Create a tuning job

You can create a supervised fine-tuning job by using several methods. The following table can help you decide which option is best for your use case.

Tool Description Use Case
Google Cloud console Create and manage tuning jobs with a graphical user interface, without writing code. Ideal for beginners, quick experiments, or users who prefer a visual interface.
Google Gen AI SDK / Vertex AI SDK for Python Use Python for programmatic control and integration into automated workflows and applications. Best for developers and MLOps engineers who need to automate, customize, and integrate tuning into larger systems.
REST API Use any programming language that can make HTTP requests to interact directly with the tuning service. Suitable for integration with any programming language or environment that can make HTTP requests.
Colab Enterprise Use an interactive notebook environment with UI helpers that generate tuning code snippets. Good for experimentation and development within a familiar notebook interface, combining code with UI assistance.

You can create a supervised fine-tuning job by using the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, the REST API, or Colab Enterprise:

Console

To tune a text model with supervised fine-tuning by using the Google Cloud console:

  1. In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. Click Create tuned model.

  3. Under Model details, configure the following:

    1. In the Tuned model name field, enter a name for your new tuned model, up to 128 characters.
    2. In the Base model field, select gemini-2.5-flash.
    3. In the Region drop-down field, Select the region where the pipeline tuning job runs and where the tuned model is deployed.
  4. Under Tuning setting, configure the following:

    1. In the Number of epochs field, enter the number of steps to run for model tuning.
    2. In the Adapter Size field, enter the adapter size to use for model tuning.
    3. In the Learning rate multiplier field, enter the step size at each iteration. The default value is 1. .
  5. Optional: To disable intermediate checkpoints and use only the latest checkpoint, click the Export last checkpoint only toggle.

  6. Click Continue.

    The Tuning dataset page opens.

  7. To upload a dataset file, select one of the following options:

    1. If you haven't uploaded a dataset yet, select the Upload file to Cloud Storage radio button.
      1. In the Select JSONL file field, click Browse and select your dataset file.
      2. In the Dataset location field, click Browse and select the Cloud Storage bucket where you want to store your dataset file.
    2. If your dataset file is already in a Cloud Storage bucket, select the Existing file on Cloud Storage radio button.
      1. In the Cloud Storage file path field, click Browse and select the Cloud Storage bucket where your dataset file is located.
  8. Optional: To get validation metrics during training, click the Enable model validation toggle.

    1. In the Validation dataset file, enter the Cloud Storage path of your validation dataset.
  9. Click Start Tuning.

    Your new model appears in the Gemini Pro tuned models section on the Tune and Distill page. When the model finishes tuning, the Status is Succeeded.

Google Gen AI SDK

import time

from google import genai
from google.genai.types import HttpOptions, CreateTuningJobConfig, TuningDataset, EvaluationConfig, OutputConfig, GcsDestination, Metric

# TODO(developer): Update and un-comment below line
# output_gcs_uri = "gs://your-bucket/your-prefix"

client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))

training_dataset = TuningDataset(
    gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_train_data.jsonl",
)
validation_dataset = TuningDataset(
    gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_validation_data.jsonl",
)

evaluation_config = EvaluationConfig(
    metrics=[
        Metric(
            name="FLUENCY",
            prompt_template="""Evaluate this {response}"""
        )
    ],
    output_config=OutputConfig(
        gcs_destination=GcsDestination(
            output_uri_prefix=output_gcs_uri,
        )
    ),
)

tuning_job = client.tunings.tune(
    base_model="gemini-2.5-flash",
    training_dataset=training_dataset,
    config=CreateTuningJobConfig(
        tuned_model_display_name="Example tuning job",
        validation_dataset=validation_dataset,
        evaluation_config=evaluation_config,
    ),
)

running_states = set([
    "JOB_STATE_PENDING",
    "JOB_STATE_RUNNING",
])

while tuning_job.state in running_states:
    print(tuning_job.state)
    tuning_job = client.tunings.get(name=tuning_job.name)
    time.sleep(60)

print(tuning_job.tuned_model.model)
print(tuning_job.tuned_model.endpoint)
print(tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678

if tuning_job.tuned_model.checkpoints:
    for i, checkpoint in enumerate(tuning_job.tuned_model.checkpoints):
        print(f"Checkpoint {i + 1}: ", checkpoint)
    # Example response:
    # Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'
    # Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

Vertex AI SDK for Python


import time

import vertexai
from vertexai.tuning import sft

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

sft_tuning_job = sft.train(
    source_model="gemini-2.0-flash-001",
    # 1.5 and 2.0 models use the same JSONL format
    train_dataset="gs://cloud-samples-data/ai-platform/generative_ai/gemini-1_5/text/sft_train_data.jsonl",
)

# Polling for job completion
while not sft_tuning_job.has_ended:
    time.sleep(60)
    sft_tuning_job.refresh()

print(sft_tuning_job.tuned_model_name)
print(sft_tuning_job.tuned_model_endpoint_name)
print(sft_tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# <google.cloud.aiplatform.metadata.experiment_resources.Experiment object at 0x7b5b4ae07af0>

REST

To create a model tuning job, send a POST request by using the tuningJobs.create method. Some parameters aren't supported by all models. Make sure that you include only the parameters that apply to the model that you're tuning.

Optional (Preview): Include the evaluationConfig to automatically run an evaluation using the Gen AI evaluation service after the tuning job completes. This evaluation configuration is available in the us-central1 region.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • BASE_MODEL: Name of the foundation model to tune.
  • TRAINING_DATASET_URI: Cloud Storage URI of your training dataset. The dataset must be formatted as a JSONL file. For best results, provide at least 100 to 500 examples. For more information, see About supervised tuning datasets .
  • VALIDATION_DATASET_URIOptional: The Cloud Storage URI of your validation dataset file.
  • EPOCH_COUNTOptional: The number of complete passes the model makes over the entire training dataset during training. Leave it unset to use the pre-populated recommended value.
  • ADAPTER_SIZEOptional: The Adapter size to use for the tuning job. The adapter size influences the number of trainable parameters for the tuning job. A larger adapter size implies that the model can learn more complex tasks, but it requires a larger training dataset and longer training times.
  • LEARNING_RATE_MULTIPLIER: Optional: A multiplier to apply to the recommended learning rate. Leave it unset to use the recommended value.
  • EXPORT_LAST_CHECKPOINT_ONLYOptional: Set to true to use only the latest checkpoint.
  • METRIC_SPECOptional: One or more metric specs you are using to run an evaluation using the Gen AI evaluation service. You can use the following metric specs: "pointwise_metric_spec", "pairwise_metric_spec".
  • METRIC_SPEC_FIELD_NAMEOptional: The required fields for your chosen metric spec. For example, "metric_prompt_template"
  • METRIC_SPEC_FIELD_NAME_CONTENTOptional: The field content for your chosen metric spec. For example, you can use the following field content for a pointwise evaluation: "Evaluate the fluency of this sentence: {response}. Give score from 0 to 1. 0 - not fluent at all. 1 - very fluent."
  • CLOUD_STORAGE_BUCKETOptional: The Cloud Storage bucket to store the results of an evaluation run by the Gen AI evaluation service.
  • TUNED_MODEL_DISPLAYNAMEOptional: A display name for the tuned model. If not set, a random name is generated.
  • KMS_KEY_NAMEOptional: The Cloud KMS resource identifier of the customer-managed encryption key used to protect a resource. The key has the format: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as where the compute resource is created. For more information, see Customer-managed encryption keys (CMEK).
  • SERVICE_ACCOUNTOptional: The service account that the tuningJob workload runs as. If not specified, the Vertex AI Secure Fine-Tuning Service Agent in the project is used. See Tuning Service Agent. If you plan to use a customer-managed Service Account, you must grant the roles/aiplatform.tuningServiceAgent role to the service account. Also grant the Tuning Service Agent roles/iam.serviceAccountTokenCreator role to the customer-managed Service Account.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs

Request JSON body:

{
  "baseModel": "BASE_MODEL",
  "supervisedTuningSpec" : {
      "trainingDatasetUri": "TRAINING_DATASET_URI",
      "validationDatasetUri": "VALIDATION_DATASET_URI",
      "hyperParameters": {
          "epochCount": "EPOCH_COUNT",
          "adapterSize": "ADAPTER_SIZE",
          "learningRateMultiplier": "LEARNING_RATE_MULTIPLIER"
      },
      "exportLastCheckpointOnly": EXPORT_LAST_CHECKPOINT_ONLY,
      "evaluationConfig": {
          "metrics": [
              {
                  "aggregation_metrics": ["AVERAGE", "STANDARD_DEVIATION"],
                  "METRIC_SPEC": {
                      "METRIC_SPEC_FIELD_NAME":
                          METRIC_SPEC_FIELD_CONTENT
                  }
              },
          ],
          "outputConfig": {
              "gcs_destination": {
                  "output_uri_prefix": "CLOUD_STORAGE_BUCKET"
              }
          },
      },
  },
  "tunedModelDisplayName": "TUNED_MODEL_DISPLAYNAME",
  "encryptionSpec": {
    "kmsKeyName": "KMS_KEY_NAME"
  },
  "serviceAccount": "SERVICE_ACCOUNT"
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Example curl command

PROJECT_ID=myproject
LOCATION=global
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/tuningJobs" \
-d \
$'{
   "baseModel": "gemini-2.5-flash",
   "supervisedTuningSpec" : {
      "training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_train_data.jsonl",
      "validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_validation_data.jsonl"
   },
   "tunedModelDisplayName": "tuned_gemini"
}'

Colab Enterprise

In Colab Enterprise, you can create a model tuning job by using the side panel. The side panel adds the relevant code snippets to your notebook. You can then modify and run the code snippets to create your tuning job. To learn more about using the side panel with your Vertex AI tuning jobs, see Interact with Vertex AI to tune a model.

  1. In the Google Cloud console, go to the Colab Enterprise My notebooks page.

    Go to My notebooks

  2. In the Region menu, select the region that contains your notebook.

  3. Click the notebook that you want to open. If you haven't created a notebook yet, create a notebook.

  4. To the right of your notebook, in the side panel, click the  Tuning button.

    The side panel expands the Tuning tab.

  5. Click the Tune a Gemini model button.

    Colab Enterprise adds code cells to your notebook for tuning a Gemini model.

  6. In your notebook, find the code cell that stores parameter values. You'll use these parameters to interact with Vertex AI.

  7. Update the values for the following parameters:

    • PROJECT_ID: The ID of the project that your notebook is in.
    • REGION: The region that your notebook is in.
    • TUNED_MODEL_DISPLAY_NAME: The name of your tuned model.
  8. In the next code cell, update the model tuning parameters:

    • source_model: The Gemini model that you want to use, for example, gemini-2.0-flash-001.
    • train_dataset: The URL of your training dataset.
    • validation_dataset: The URL of your validation dataset.
    • Adjust the remaining parameters as needed.
  9. Run the code cells that the side panel added to your notebook.

  10. After the last code cell runs, click the  View tuning job button that appears.

  11. The side panel shows information about your model tuning job.

    • The Monitor tab shows tuning metrics when the metrics are ready.
    • The Dataset tab shows a summary and metrics about your dataset after the dataset has been processed.
    • The Details tab shows information about your tuning job, such as the tuning method and the base model (source model) that you used.
  12. After the tuning job has completed, you can go directly from the Tuning details tab to a page where you can test your model. Click Test.

    The Google Cloud console opens to the Vertex AI Text chat page, where you can test your model.

Tuning hyperparameters

We recommend that you submit your first tuning job without changing the default hyperparameters. The default values are based on benchmarking results to yield the best model quality.

  • Epochs: The number of complete passes the model makes over the entire training dataset. Vertex AI automatically adjusts the default value based on your training dataset size to optimize model quality.
  • Adapter size: The size of the adapter to use for the tuning job. The adapter size influences the number of trainable parameters. A larger adapter size lets the model learn more complex tasks, but it requires a larger training dataset and longer training times.
  • Learning rate multiplier: A multiplier to apply to the recommended learning rate. You can increase the value to converge faster or decrease it to avoid overfitting.

For a discussion of best practices for supervised fine-tuning, see the blog post Supervised Fine Tuning for Gemini: A best practices guide.

Manage tuning jobs

View a list of tuning jobs

To view a list of tuning jobs in your current project, you can use the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, or send a GET request by using the tuningJobs method.

Console

To view your tuning jobs in the Google Cloud console, go to the Vertex AI Studio page.

Go to Vertex AI Studio

Your Gemini tuning jobs are listed in the table in the Gemini Pro tuned models section.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

responses = client.tunings.list()
for response in responses:
    print(response.name)
    # Example response:
    # projects/123456789012/locations/us-central1/tuningJobs/123456789012345

Vertex AI SDK for Python

import vertexai
from vertexai.tuning import sft

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

responses = sft.SupervisedTuningJob.list()

for response in responses:
    print(response)
# Example response:
# <vertexai.tuning._supervised_tuning.SupervisedTuningJob object at 0x7c85287b2680>
# resource name: projects/12345678/locations/us-central1/tuningJobs/123456789012345

REST

To view a list of model tuning jobs, send a GET request by using the tuningJobs.list method.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.

HTTP method and URL:

GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Get details of a tuning job

To get the details of a tuning job in your current project, you can use the Google Cloud console, the Google Gen AI SDK, the Vertex AI SDK for Python, or send a GET request by using the tuningJobs method.

Console

  1. To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, find your model and click Details.

    The details for your model are displayed.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)

print(tuning_job.tuned_model.model)
print(tuning_job.tuned_model.endpoint)
print(tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678

Vertex AI SDK for Python

import vertexai
from vertexai.tuning import sft

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# LOCATION = "us-central1"
vertexai.init(project=PROJECT_ID, location=LOCATION)

tuning_job_id = "4982013113894174720"
response = sft.SupervisedTuningJob(
    f"projects/{PROJECT_ID}/locations/{LOCATION}/tuningJobs/{tuning_job_id}"
)

print(response)
# Example response:
# <vertexai.tuning._supervised_tuning.SupervisedTuningJob object at 0x7cc4bb20baf0>
# resource name: projects/1234567890/locations/us-central1/tuningJobs/4982013113894174720

REST

To view a list of model tuning jobs, send a GET request by using the tuningJobs.get method and specify the TuningJob_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • TUNING_JOB_ID: The ID of the tuning job.

HTTP method and URL:

GET https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Cancel a tuning job

To cancel a tuning job in your current project, you can use the Google Cloud console, the Vertex AI SDK for Python, or the REST API.

Console

  1. To cancel a tuning job in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, click Manage run.

  3. Click Cancel.

Vertex AI SDK for Python

import vertexai
from vertexai.tuning import sft

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# LOCATION = "us-central1"
vertexai.init(project=PROJECT_ID, location=LOCATION)

tuning_job_id = "4982013113894174720"
job = sft.SupervisedTuningJob(
    f"projects/{PROJECT_ID}/locations/{LOCATION}/tuningJobs/{tuning_job_id}"
)
job.cancel()

REST

To cancel a model tuning job, send a POST request by using the tuningJobs.cancel method and specify the TuningJob_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • TUNING_JOB_ID: The ID of the tuning job.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel

To send your request, choose one of these options:

curl

Execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs/TUNING_JOB_ID:cancel" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Evaluate and use your tuned model

Evaluate the tuned model

If you didn't configure the Gen AI evaluation service to run automatically after the tuning job, you can interact with the tuned model endpoint in the same way as the base Gemini model. You can use the Vertex AI SDK for Python, the Google Gen AI SDK, or send a POST request by using the generateContent method.

For thinking models like Gemini 2.5 Flash, we recommend that you set the thinking budget to 0 to turn off thinking on tuned tasks for optimal performance and cost efficiency. During supervised fine-tuning, the model learns to mimic the ground truth in the tuning dataset and omits the thinking process. Therefore, the tuned model can handle the task effectively without a thinking budget.

The following example shows how to prompt a model with the question "Why is the sky blue?".

Console

  1. To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Gemini Pro tuned models table, select Test.

    A page opens where you can create a conversation with your tuned model.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)

contents = "Why is the sky blue?"

# Predicts with the tuned endpoint.
response = client.models.generate_content(
    model=tuning_job.tuned_model.endpoint,
    contents=contents,
)
print(response.text)
# Example response:
# The sky is blue because ...

Vertex AI SDK for Python

from vertexai.generative_models import GenerativeModel

sft_tuning_job = sft.SupervisedTuningJob("projects/<PROJECT_ID>/locations/<TUNING_JOB_REGION>/tuningJobs/<TUNING_JOB_ID>")
tuned_model = GenerativeModel(sft_tuning_job.tuned_model_endpoint_name)
print(tuned_model.generate_content(content))

REST

To test a tuned model with a prompt, send a POST request and specify the TUNED_ENDPOINT_ID.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • TUNING_JOB_REGION: The region where the tuning job runs. This is also the default region for where the tuned model is uploaded.
  • ENDPOINT_ID: The tuned model endpoint ID from the GET API.
  • TEMPERATURE: The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

    If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

  • TOP_P: Top-P changes how the model selects tokens for output. Tokens are selected from the most probable to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

    Specify a lower value for less random responses and a higher value for more random responses.

  • TOP_K: Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

    For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

    Specify a lower value for less random responses and a higher value for more random responses.

  • MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

    Specify a lower value for shorter responses and a higher value for potentially longer responses.

HTTP method and URL:

POST https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent

Request JSON body:

{
    "contents": [
        {
            "role": "USER",
            "parts": {
                "text" : "Why is sky blue?"
            }
        }
    ],
    "generation_config": {
        "temperature":TEMPERATURE,
        "topP": TOP_P,
        "topK": TOP_K,
        "maxOutputTokens": MAX_OUTPUT_TOKENS
    }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/endpoints/ENDPOINT_ID:generateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Delete a tuned model

To delete a tuned model:

REST

Call the models.delete method.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: .
  • REGION: The region where the tuned model is located.
  • MODEL_ID: The model to delete.

HTTP method and URL:

DELETE https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://REGION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

Vertex AI SDK for Python

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

# To find out which models are available in Model Registry
models = aiplatform.Model.list()

model = aiplatform.Model(MODEL_ID)
model.delete()

Tuning and validation metrics

You can configure a model tuning job to collect and report model tuning and model evaluation metrics. You can then visualize these metrics in Vertex AI Studio.

  1. To view details of a tuned model in the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tune and Distill table, click the name of the tuned model that you want to view metrics for.

    The tuning metrics appear on the Monitor tab.

Model tuning metrics

The model tuning job automatically collects the following tuning metrics for Gemini 2.0 Flash:

  • /train_total_loss: Loss for the tuning dataset at a training step.
  • /train_fraction_of_correct_next_step_preds: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
  • /train_num_predictions: Number of predicted tokens at a training step.

Model validation metrics

You can configure a model tuning job to collect the following validation metrics for Gemini 2.0 Flash:

  • /eval_total_loss: Loss for the validation dataset at a validation step.
  • /eval_fraction_of_correct_next_step_preds: The token accuracy at an validation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the validation dataset.
  • /eval_num_predictions: Number of predicted tokens at a validation step.

The metrics visualizations are available after the tuning job starts. The visualizations are updated in real time as tuning progresses. If you don't specify a validation dataset when you create the tuning job, only the tuning metrics visualizations are available.

What's next