Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
This page shows you how to use the count-tokens endpoint to get the number of tokens in a message before you send it to a Claude model. You can use the token count to ensure your prompts don't exceed the model's context window.
There is no charge for using the count-tokens endpoint.
ROLE: The role associated with a
message. You can specify a user or an assistant.
The first message must use the user role. Claude models
operate with alternating user and assistant turns.
If the final message uses the assistant role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.
CONTENT: The content, such as text, of the user or
assistant message.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict
Request JSON body:
{
"model": "MODEL",
"messages": [
{
"role": "user",
"content":"how many tokens are in this request?"
}
],
}
To send your request, choose one of these options:
curl
Save the request body in a file named request.json,
and execute the following command:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-27 UTC."],[],[],null,["# Count tokens for Claude models\n\nThe `count-tokens` endpoint lets you determine the number of tokens in a\nmessage before sending it to Claude, helping you make informed decisions about\nyour prompts and usage.\n\nThere is no cost for using the `count-tokens` endpoint.\n\nSupported Claude models\n-----------------------\n\nThe following models support count tokens:\n\n- [Claude Opus 4.1](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-opus-4-1)\n- [Claude Opus 4](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-opus-4)\n- [Claude Sonnet 4](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-sonnet-4)\n- [Claude 3.7 Sonnet](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-7-sonnet)\n- [Claude 3.5 Sonnet v2](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-5-sonnet-v2)\n- [Claude 3.5 Haiku](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-5-haiku)\n- [Claude 3.5 Sonnet](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-5-sonnet)\n- [Claude 3 Opus](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-opus)\n- [Claude 3 Haiku](https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-haiku)\n\n\u003cbr /\u003e\n\nSupported regions\n-----------------\n\nThe following regions support count tokens:\n\n- `us-east5`\n- `europe-west1`\n- `asia-east1`\n- `asia-southeast1`\n- `us-central1`\n- `europe-west4`\n\nCount tokens in basic messages\n------------------------------\n\nTo count tokens, send a `rawPredict` request to the `count-tokens` endpoint. The\nbody of the request must contain the model ID of the model you want to count\ntokens against. \n\n### REST\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar class=\"edit\" scope=\"LOCATION\" translate=\"no\"\u003eLOCATION\u003c/var\u003e: A [region](#regions) that supports Anthropic Claude models. To use the global endpoint, see [Specify\n the global endpoint](/vertex-ai/generative-ai/docs/partner-models/use-partner-models#global).\n- \u003cvar class=\"edit\" scope=\"MODEL\" translate=\"no\"\u003eMODEL\u003c/var\u003e: The [model](#model-list) to count tokens against.\n- \u003cvar translate=\"no\"\u003eROLE\u003c/var\u003e: The role associated with a message. You can specify a `user` or an `assistant`. The first message must use the `user` role. Claude models operate with alternating `user` and `assistant` turns. If the final message uses the `assistant` role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response.\n- \u003cvar translate=\"no\"\u003eCONTENT\u003c/var\u003e: The content, such as text, of the `user` or `assistant` message.\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict\n```\n\n\nRequest JSON body:\n\n```\n{\n \"model\": \"MODEL\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\":\"how many tokens are in this request?\"\n }\n ],\n}\n```\n\nTo send your request, choose one of these options: \n\n#### curl\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict\"\n```\n\n#### PowerShell\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/count-tokens:rawPredict\" | Select-Object -Expand Content\n```\n\nYou should receive a JSON response similar to the following.\n\n#### Response\n\n```\n{ \"input_tokens\": 14 }\n```\n\n\u003cbr /\u003e\n\nFor information on how to count tokens in messages with tools, images, and PDFs,\nsee [Anthropic's documentation](https://docs.anthropic.com/en/docs/build-with-claude/token-counting).\n\nQuotas\n------\n\nBy default, the quota for the `count-tokens` endpoint is 2000 requests per\nminute."]]