TensorZero’s cover photo
TensorZero

TensorZero

Technology, Information and Internet

About us

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Website
https://www.tensorzero.com/
Industry
Technology, Information and Internet
Company size
2-10 employees
Type
Privately Held

Employees at TensorZero

Updates

  • TensorZero 2026.1.7 is out! 📌 This release introduces the preview of TensorZero Autopilot — our automated AI engineer (learn more on tensorzero.com). — Full Changelog 🆕 [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. 🆕 Support multi-turn reasoning for xAI (`reasoning_content` only). & multiple under-the-hood and UI improvements! https://lnkd.in/enKiwP8h

  • TensorZero 2026.1.6 is out! 📌 This releases brings further improvements around reasoning models, error handling, and usage tracking. — Full Changelog 🚨 [Breaking Change] Moving forward, TensorZero will use the OpenAI API's error format (`{"error": {"message": "Bad!"}`) instead of TensorZero's error format (`{"error": "Bad!"}`) in the OpenAI-compatible endpoints. ⚠️ [Planned Deprecation] When using `unstable_error_json` with the OpenAI-compatible inference endpoint, use `tensorzero_error_json` instead of `error_json`. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected. 🆕 Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use `extra_body` to handle these tools. 🆕 Improve handling of reasoning content blocks when streaming with the OpenAI Responses API. 🆕 Handle inferences with missing `usage` fields gracefully in the OpenAI model provider. 🆕 Improve error handling across the UI. & multiple under-the-hood and UI improvements! https://lnkd.in/edNXnz_X

  • TensorZero 2026.1.5 is out! 📌 This release brings many improvements around error handling, reasoning model, rate limiting performance, and more. — Full Changelog 🚨 [Breaking Change] TensorZero will normalize the reported `usage` from different model providers. Moving forward, `input_tokens` and `output_tokens` include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with `include_raw_usage`. ⚠️ [Planned Deprecations] Migrate `include_original_response` to `include_raw_response`. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata. ⚠️ [Planned Deprecations] Migrate `allow_auto_detect_region = true` to `region = "sdk"` when configuring AWS model providers. The behavior is identical. ⚠️ [Planned Deprecations] Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. 🔨 Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models. 🔨 Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools. 🆕 Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc. 🆕 Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS). 🆕 Support `reasoning_effort` for Gemini 3 models (mapped to `thinkingLevel`). 🆕 Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, `json_mode = "strict"` will use the beta structured outputs feature; `json_mode = "on"` still uses the legacy assistant message prefill. 🆕 Improve handling of reasoning content in the OpenRouter and xAI model providers. 🆕 Add `extra_headers` support for embedding models. (thanks jonaylor89!) 🆕 Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers. & multiple under-the-hood and UI improvements (thanks ndoherty-xyz)! https://lnkd.in/ewY7izAh

  • TensorZero 2026.1.2 is out! 📌 This is a small release that improves the developer experience of using long-tail LLM capabilities. — 🆕 Support appending to arrays with `extra_body` using the `/my_array/-` notation. 🆕 Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio. & multiple under-the-hood and UI improvements https://lnkd.in/ekXYH2nc

  • TensorZero 2026.1.1 is out! 📌 This release brings improvements and bug fixes to token usage reporting. — ⚠️ [Planned Deprecation] In a future release, the parameter `model` will be required when initializing `DICLOptimizationConfig`. The parameter remains optional (defaults to `openai::gpt-5-mini`) in the meantime. 🔨 Stop buffering `raw_usage` when streaming with the OpenAI-compatible inference endpoint; instead, emit `raw_usage` as soon as possible, just like in the native endpoint. 🔨 Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected. 🆕 Support `stream_options.include_usage` for every model under the Azure provider. & multiple under-the-hood and UI improvements! https://lnkd.in/e84XXYm8

  • TensorZero 2026.1.0 is out! 📌 This release an optional `include_raw_usage` parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized `usage` response field. — 🚨 [Breaking Changes] The Prometheus metric `tensorzero_inference_latency_overhead_seconds` will report a histogram instead of a summary. You can customize the buckets using `gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets` in the configuration (default: 1ms, 10ms, 100ms). ⚠️ [Planned Deprecation] Deprecate the `TENSORZERO_CLICKHOUSE_URL` environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse. ⚠️ [Planned Deprecation] Rename the Prometheus metric `tensorzero_inference_latency_overhead_seconds_histogram` to `tensorzero_inference_latency_overhead_seconds`. Both metrics will be emitted for now. ⚠️ [Planned Deprecation] Rename the configuration field `tensorzero_inference_latency_overhead_seconds_histogram_buckets` to `tensorzero_inference_latency_overhead_seconds_buckets`. Both fields are available for now. 🆕 Add optional `include_raw_usage` parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized `usage` response field. 🆕 Add optional `--bind-address` CLI flag to the gateway. 🆕 Add optional `description` field to metrics in the configuration. 🆕 Add option to fine-tune Fireworks models without automatic deployment. & multiple under-the-hood and UI improvements (thanks ecalifornica achaljhawar rguilmont)! https://lnkd.in/eKZVcUgZ

  • TensorZero 2025.12.6 is out! 📌 This release introduces Gateway Relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features. https://lnkd.in/e7JByKhS — 🚨 [Breaking Changes] Migrated the following optimization fields from the TensorZero Python SDK to the configuration: - `DICLOptimizationConfig`: removed `credential_location`. - `FireworksSFTConfig`: moved `account_id` to `[provider_types.fireworks.sft]`; removed `api_base` and `credential_location`. - `GCPVertexGeminiSFTConfig`: moved `bucket_name`, `bucket_path_prefix`, `kms_key_name`, `project_id`, `region`, and `service_account` to to `[provider_types.gcp_vertex_gemini.sft]`. - `OpenAIRFTConfig`: removed `api_base` and `credential_location`. - `OpenAISFTConfig`: removed `api_base` and `credential_location`. - `TogetherSFTConfig`: `hf_api_token`, `wandb_api_key`, `wandb_base_url`, and `wandb_project_name` moved to `[provider_types.together.sft]`; removed `api_base` and `credential_location`. 🆕 Support gateway relay. 🆕 Add "Try with model" button to the datapoint page in the UI. 🆕 Add `tensorzero_inference_latency_overhead_seconds_histogram` Prometheus metric for meta-observability. 🆕 Add `concurrency` parameter to `experimental_render_samples` (defaults to 100). 🆕 Add `otlp_traces_extra_attributes` and `otlp_traces_extra_resources` to the TensorZero Python SDK. (thanks jinnovation!) & multiple under-the-hood and UI improvements (thanks ecalifornica)! https://lnkd.in/e5DxBqqR

  • TensorZero 2025.12.5 is out! 📌 This version introduces a revamped dataset builder in the UI that supports complex queries (e.g. filter by tags, feedback, logical operators). 💡 We're skipping version 2025.12.4 due to an issue in our publishing process. — ⚠️ [Planned Deprecation] The variant type `experimental_chain_of_thought` will be deprecated in `2026.2+`. As reasoning models are becoming prevalent, please use their native reasoning capabilities. ⚠️ [Planned Deprecation] The `timeout_s` configuration field for best/mixture-of-N variants will be deprecated in `2026.2+`. Please use the `[timeouts]` block in the configuration for their candidates instead. 🆕 Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback). 🆕 Export `tensorzero_inference_latency_overhead_seconds` Prometheus metric for meta-observability. 🆕 Allow users to disable TensorZero API keys using `--disable-api-key` in the CLI. (thanks jinnovation!) & multiple under-the-hood and UI improvements (thanks ecalifornica)! https://lnkd.in/e-whxWeS

Similar pages

Browse jobs

Funding

TensorZero 1 total round

Last Round

Seed

US$ 7.3M

See more info on crunchbase