Skip to content

Releases: deepset-ai/haystack

v2.22.0

08 Jan 14:25

Choose a tag to compare

⭐️ Highlights

✂️ Smarter Document Chunking with Embedding-Based Splitting

Introducing the new EmbeddingBasedDocumentSplitter, a component that takes an embedder and splits documents based on semantic similarity rather than fixed sizes or rules.

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.preprocessors import EmbeddingBasedDocumentSplitter

# Initialize an embedder to calculate semantic similarities
embedder = SentenceTransformersDocumentEmbedder()

# Configure the splitter with parameters that control splitting behavior
splitter = EmbeddingBasedDocumentSplitter(
    document_embedder=embedder,
    sentences_per_group=2,      # Group 2 sentences before calculating embeddings
    percentile=0.95,            # Split when cosine distance exceeds 95th percentile
    min_length=50,              # Merge splits shorter than 50 characters
    max_length=1000             # Further split chunks longer than 1000 characters
)
result = splitter.run(documents=[doc])

🔥 warm_up Runs Automatically on First Use

Components that define awarm_up method now run it automatically on first execution, removing the need for manual calls and preventing errors in standalone usage.

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_embedder = SentenceTransformersTextEmbedder()
# text_embedder.warm_up() # ❌ Don't need this step anymore
print(text_embedder.run("I love pizza!"))

## {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

🛠️ Multiple Tool String Outputs with outputs_to_string

Tools can now expose multiple string outputs via the new outputs_to_string configuration, giving you fine-grained control over how tool results are surfaced to the LLM, without changing the underlying tool logic.

def format_documents(documents):
    return "\n".join(f"{i+1}. Document: {doc.content}" for i, doc in enumerate(documents))

def format_summary(metadata):
    return f"Found {metadata['count']} results"

tool = Tool(
    name="search",
    description="Search for documents",
    parameters={...},
    function=search_func,  # Returns {"documents": [Document(...)], "metadata": {"count": 5}, "debug_info": {...}}
    outputs_to_string={
        "formatted_docs": {"source": "documents", "handler": format_documents},
        "summary": {"source": "metadata", "handler": format_summary}
        # Note: "debug_info" is not included, so it won't be converted to a string
    }
)

# After the tool invocation, the tool result includes:
# {
#     "formatted_docs": "1. Document Title\n   Content...\n2. ...",
#     "summary": "Found 5 results"
# }

🐍 Python 3.10+ Only

Haystack now requires Python 3.10 or later, as Python 3.9 reached End of Life (EOL) in October 2025.

⬆️ Upgrade Notes

  • HuggingFaceLocalChatGenerator now uses Qwen/Qwen3-0.6B as the default model, replacing the previous default.

⚡️Enhancement Notes

  • The parameters query_suffix and document_suffix have been added to SentenceTransformersSimilarityRanker to support the Qwen3 reranker model family.

    Here is an example of how to use these new parameters to use the Qwen3-Reranker-0.6B:

    from haystack import Document
    from haystack.components.rankers.sentence_transformers_similarity import SentenceTransformersSimilarityRanker
    
    ranker = SentenceTransformersSimilarityRanker(
        model="tomaarsen/Qwen3-Reranker-0.6B-seq-cls",
        query_prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: ',
        query_suffix="\n",
        document_prefix="<Document>: ",
        document_suffix="<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    )
    
    result = ranker.run(
        query="Which planet is known as the Red Planet?",
        documents=[
            Document(content="Venus is often called Earth's twin because of its similar size and proximity."),
            Document(content="Mars, known for its reddish appearance, is often referred to as the Red Planet."),
            Document(content="Jupiter, the largest planet in our solar system, has a prominent red spot."),
            Document(content="Saturn, famous for its rings, is sometimes mistaken for the Red Planet."),
        ],
    )
    
    print(result)

    NOTE: This only works with the Qwen3 reranker models that use the sequence classification architecture. For example, you can find some on tomaarsen's Hugging Face profile.

  • Added reasoning content support to HuggingFaceAPIChatGenerator. The component now extracts reasoning content from models that support chain-of-thought reasoning (e.g., DeepSeek R1). Both streaming and non-streaming modes are supported. Access via reply.reasoning.reasoning_text.

  • When an Agent runs as part of a Pipeline, the agent's tracing span now uses the component span as its parent. This enables proper nested trace visualization in tracing tools like Datadog, Braintrust, or OpenTelemetry backends.

  • The _handle_async_stream_response() method in OpenAIChatGenerator now handles asyncio.CancelledError exceptions. When a streaming task is cancelled mid-stream, the async for loop gracefully closes the stream using asyncio.shield() to ensure the cleanup operation completes even during cancellation.

  • A new enable_thinking parameter has been added to enable thinking mode in chat templates for thinking-capable models, allowing them to generate intermediate reasoning steps before producing final responses.

  • Add support for PEP 604 type syntax. This means that when defining types in components, you can use X | Y instead of Union[X, Y] and X | None instead of Optional[X]. The codebase has been migrated to the new syntax, but both syntaxes are fully supported.

  • Support Multiple Tool String Outputs

    Added support for tools to define multiple string outputs using the outputs_to_string configuration. This allows users to specify how different parts of a tool's output should be converted to strings, enhancing flexibility in handling tool results.

    • Updated ToolInvoker to handle multiple output configurations.
    • Updated Tool to validate and store multiple output configurations.
    • Added tests to verify the functionality of multiple string outputs.

    This enables tools to provide rich, varied context to language models or downstream components without requiring multiple tool calls, while keeping full control over which outputs are stringified.

  • Added validation for inputs_from_state and outputs_to_state parameters in the Tool class. Tools now validate at construction time that state mappings reference valid tool parameters and outputs, catching configuration errors early instead of at runtime. The validation uses function introspection and JSON schema to ensure parameter names exist, and subclasses like ComponentTool validate against component input/output sockets.

🐛 Bug Fixes

  • Improved error messages in ConditionalRouter when non-string values are provided as route outputs. Users now receive clear guidance (e.g., "use '2' instead of 2") instead of the cryptic "Can't compile non template nodes" error.
  • Fixes jinja2 variable detection in ConditionalRouter, ChatPromptBuilder, PromptBuilder and OutputAdapter by properly skipping variables that are assigned within the template. Previously under specific scenarios variables assigned within a template would falsely be picked up as input variables to the component. For more information you can check out the parent issue in the Jinja2 library here: pallets/jinja#2069
  • Fixes deserializing an instance of NamedEntityExtractor when pipeline_kwargs is stored in the deserialization dict with the value of None.
  • When creating an HTTP client object from a dictionary, we now convert the limits parameter to an httpx.Limits object to avoid AttributeError.
  • Raise a ValueError when an async function is passed to the Tool class. Async functions are not supported as tools. This change provides a clear error message instead of silent failures where coroutines are never awaited.

⚠️ Deprecation Notes

  • The return_empty_on_no_match parameter has been removed from the RegexTextExtractor component. This component now always returns a dictionary with the key "captured_text"; the value can be an empty string if no match is found or the captured text. Currently, the return_empty_on_no_match parameter is ignored. Starting from Haystack 2.23.0, initializing the component with this parameter will raise an error.

💙 Big thank you to everyone who contributed to this release!

@anakin87, @ArzelaAscoIi, @bilgeyucel, @Bobholamovic, @davidsbatista, @dfokina, @GunaPalanivel, @majiayu000, @OliverZhangA, @sjrl, @TaMaN2031A, @tommasocerruti, @tstadel, @vblagoje, @YassineGabsi

v2.22.0-rc1

07 Jan 13:31

Choose a tag to compare

v2.22.0-rc1 Pre-release
Pre-release

⭐️ Highlights

✂️ Smarter Document Chunking with Embedding-Based Splitting

Introducing the new EmbeddingBasedDocumentSplitter, a component that takes an embedder and splits documents based on semantic similarity rather than fixed sizes or rules.

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.preprocessors import EmbeddingBasedDocumentSplitter

# Initialize an embedder to calculate semantic similarities
embedder = SentenceTransformersDocumentEmbedder()

# Configure the splitter with parameters that control splitting behavior
splitter = EmbeddingBasedDocumentSplitter(
    document_embedder=embedder,
    sentences_per_group=2,      # Group 2 sentences before calculating embeddings
    percentile=0.95,            # Split when cosine distance exceeds 95th percentile
    min_length=50,              # Merge splits shorter than 50 characters
    max_length=1000             # Further split chunks longer than 1000 characters
)
result = splitter.run(documents=[doc])

🔥 warm_up Runs Automatically on First Use

Components that define awarm_up method now run it automatically on first execution, removing the need for manual calls and preventing errors in standalone usage.

from haystack.components.embedders import SentenceTransformersTextEmbedder

text_embedder = SentenceTransformersTextEmbedder()
# text_embedder.warm_up() # ❌ Don't need this step anymore
print(text_embedder.run("I love pizza!"))

## {'embedding': [-0.07804739475250244, 0.1498992145061493,, ...]}

🛠️ Multiple Tool String Outputs with outputs_to_string

Tools can now expose multiple string outputs via the new outputs_to_string configuration, giving you fine-grained control over how tool results are surfaced to the LLM, without changing the underlying tool logic.

def format_documents(documents):
    return "\n".join(f"{i+1}. Document: {doc.content}" for i, doc in enumerate(documents))

def format_summary(metadata):
    return f"Found {metadata['count']} results"

tool = Tool(
    name="search",
    description="Search for documents",
    parameters={...},
    function=search_func,  # Returns {"documents": [Document(...)], "metadata": {"count": 5}, "debug_info": {...}}
    outputs_to_string={
        "formatted_docs": {"source": "documents", "handler": format_documents},
        "summary": {"source": "metadata", "handler": format_summary}
        # Note: "debug_info" is not included, so it won't be converted to a string
    }
)

# After the tool invocation, the tool result includes:
# {
#     "formatted_docs": "1. Document Title\n   Content...\n2. ...",
#     "summary": "Found 5 results"
# }

🐍 Python 3.10+ Only

Haystack now requires Python 3.10 or later, as Python 3.9 reached End of Life (EOL) in October 2025.

⬆️ Upgrade Notes

  • HuggingFaceLocalChatGenerator now uses Qwen/Qwen3-0.6B as the default model, replacing the previous default.

⚡️Enhancement Notes

  • The parameters query_suffix and document_suffix have been added to SentenceTransformersSimilarityRanker to support the Qwen3 reranker model family.

    Here is an example of how to use these new parameters to use the Qwen3-Reranker-0.6B:

    from haystack import Document
    from haystack.components.rankers.sentence_transformers_similarity import SentenceTransformersSimilarityRanker
    
    ranker = SentenceTransformersSimilarityRanker(
        model="tomaarsen/Qwen3-Reranker-0.6B-seq-cls",
        query_prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: Given a web search query, retrieve relevant passages that answer the query\n<Query>: ',
        query_suffix="\n",
        document_prefix="<Document>: ",
        document_suffix="<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n",
    )
    
    result = ranker.run(
        query="Which planet is known as the Red Planet?",
        documents=[
            Document(content="Venus is often called Earth's twin because of its similar size and proximity."),
            Document(content="Mars, known for its reddish appearance, is often referred to as the Red Planet."),
            Document(content="Jupiter, the largest planet in our solar system, has a prominent red spot."),
            Document(content="Saturn, famous for its rings, is sometimes mistaken for the Red Planet."),
        ],
    )
    
    print(result)

    NOTE: This only works with the Qwen3 reranker models that use the sequence classification architecture. For example, you can find some on tomaarsen's Hugging Face profile.

  • Added reasoning content support to HuggingFaceAPIChatGenerator. The component now extracts reasoning content from models that support chain-of-thought reasoning (e.g., DeepSeek R1). Both streaming and non-streaming modes are supported. Access via reply.reasoning.reasoning_text.

  • When an Agent runs as part of a Pipeline, the agent's tracing span now uses the component span as its parent. This enables proper nested trace visualization in tracing tools like Datadog, Braintrust, or OpenTelemetry backends.

  • The _handle_async_stream_response() method in OpenAIChatGenerator now handles asyncio.CancelledError exceptions. When a streaming task is cancelled mid-stream, the async for loop gracefully closes the stream using asyncio.shield() to ensure the cleanup operation completes even during cancellation.

  • A new enable_thinking parameter has been added to enable thinking mode in chat templates for thinking-capable models, allowing them to generate intermediate reasoning steps before producing final responses.

  • Add support for PEP 604 type syntax. This means that when defining types in components, you can use X | Y instead of Union[X, Y] and X | None instead of Optional[X]. The codebase has been migrated to the new syntax, but both syntaxes are fully supported.

  • Support Multiple Tool String Outputs

    Added support for tools to define multiple string outputs using the outputs_to_string configuration. This allows users to specify how different parts of a tool's output should be converted to strings, enhancing flexibility in handling tool results.

    • Updated ToolInvoker to handle multiple output configurations.
    • Updated Tool to validate and store multiple output configurations.
    • Added tests to verify the functionality of multiple string outputs.

    This enables tools to provide rich, varied context to language models or downstream components without requiring multiple tool calls, while keeping full control over which outputs are stringified.

  • Added validation for inputs_from_state and outputs_to_state parameters in the Tool class. Tools now validate at construction time that state mappings reference valid tool parameters and outputs, catching configuration errors early instead of at runtime. The validation uses function introspection and JSON schema to ensure parameter names exist, and subclasses like ComponentTool validate against component input/output sockets.

🐛 Bug Fixes

  • Improved error messages in ConditionalRouter when non-string values are provided as route outputs. Users now receive clear guidance (e.g., "use '2' instead of 2") instead of the cryptic "Can't compile non template nodes" error.
  • Fixes jinja2 variable detection in ConditionalRouter, ChatPromptBuilder, PromptBuilder and OutputAdapter by properly skipping variables that are assigned within the template. Previously under specific scenarios variables assigned within a template would falsely be picked up as input variables to the component. For more information you can check out the parent issue in the Jinja2 library here: pallets/jinja#2069
  • Fixes deserializing an instance of NamedEntityExtractor when pipeline_kwargs is stored in the deserialization dict with the value of None.
  • When creating an HTTP client object from a dictionary, we now convert the limits parameter to an httpx.Limits object to avoid AttributeError.
  • Raise a ValueError when an async function is passed to the Tool class. Async functions are not supported as tools. This change provides a clear error message instead of silent failures where coroutines are never awaited.

⚠️ Deprecation Notes

  • The return_empty_on_no_match parameter has been removed from the RegexTextExtractor component. This component now always returns a dictionary with the key "captured_text"; the value can be an empty string if no match is found or the captured text. Currently, the return_empty_on_no_match parameter is ignored. Starting from Haystack 2.23.0, initializing the component with this parameter will raise an error.

💙 Big thank you to everyone who contributed to this release!

@anakin87, @ArzelaAscoIi, @bilgeyucel, @Bobholamovic, @davidsbatista, @dfokina, @github-actions[bot], @GunaPalanivel, @majiayu000, @OliverZhangA, @sjrl, @TaMaN2031A, @tommasocerruti, @tstadel, @vblagoje, @YassineGabsi

v2.21.0

08 Dec 15:47

Choose a tag to compare

⭐️ Highlights

🔍 Smarter, Broader Retrieval with Multi-Query RAG

This release introduces three new components that significantly boost retrieval recall in RAG systems by expanding the user query and retrieving documents across multiple reformulations:

  • QueryExpander generates semantically similar variations of a user query to broaden search coverage.
  • MultiQueryTextRetriever runs multiple queries in parallel using a text-based retriever (e.g., BM25) and merges results by score.
  • MultiQueryEmbeddingRetriever performs the same multi-query retrieval flow using embeddings, enabling richer semantic recall.

Used together, these components create a multi-query retrieval pipeline that improves recall especially when queries are short or ambiguous.

🧪 Example: Expanding a Query and Retrieving More Relevant Documents

from haystack import Pipeline
from haystack.components.query import QueryExpander
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.retrievers import MultiQueryTextRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy

# Sample documents
docs = [
    Document(content="Renewable energy comes from natural sources like wind and sunlight."),
    Document(content="Geothermal energy is heat from beneath the Earth's surface."),
    Document(content="Hydropower generates electricity using flowing water."),
]

# Store documents
store = InMemoryDocumentStore()
writer = DocumentWriter(document_store=store, policy=DuplicatePolicy.SKIP)
writer.run(documents=docs)

# Components
expander = QueryExpander()
retriever = InMemoryBM25Retriever(document_store=store, top_k=1)
multi_retriever = MultiQueryTextRetriever(retriever=retriever)

# Expand and retrieve
expanded = expander.run(query="renewable energy")
results = multi_retriever.run(queries=expanded["queries"])

for doc in results["documents"]:
    print(doc.content)

This pipeline expands "renewable energy" into multiple related queries, retrieves documents for each in parallel, and returns a richer set of relevant results — demonstrating how multi-query retrieval improves recall with minimal effort.

⬆️ Upgrade Notes

  • Updated the default Azure OpenAI model from gpt-4o-mini to gpt-4.1-mini and the default API version from 2023-05-15 to 2024-12-01-preview for both AzureOpenAIGenerator and AzureOpenAIChatGenerator.
  • The default OpenAI model has been changed from gpt-4o-mini to gpt-5-mini for OpenAIChatGenerator and OpenAIGenerator. If you rely on the default model and need to continue using gpt-4o-mini, explicitly specify it when initializing these components: OpenAIChatGenerator(model="gpt-4o-mini").

🚀 New Features

  • Three new components added QueryExpander, MultiQueryEmbeddingRetriever, MultiQueryTextRetriever. When used together, they allow a query to be expanded and each expansion is used to retrieve a potentially different set of documents.

⚡️Enhancement Notes

  • Added a return_empty_on_no_match parameter (default True) to RegexTextExtractor.__init__(). When set to False, the component returns {"captured_text": ""} instead of {} when no regex match is found. Provides a consistent output structure for pipeline integration.
  • The FilterRetriever and AutoMergingRetriever components now support asynchronous execution.
  • Previously, when using tracing with objects like ByteStream and ImageContent, the payload sent to the tracing backend could become too large, hitting provider limits or causing performance degradation. We now replace these objects with string placeholders to avoid oversized payloads.
  • The default OpenAI model for OpenAIChatGenerator and OpenAIGenerator has been updated from gpt-4o-mini to gpt-5-mini.

🐛 Bug Fixes

  • Ensure request header keys are unique in link_content to prevent 400 Bad Request errors.

    Some image providers return a 400 Bad Request when using ImageContent.from_url() because the User-Agent header appears multiple times with different casing (e.g., user-agent, User-Agent). This update normalizes header keys in a case-insensitive way, removes duplicates, and preserves only the last occurrence.

  • Fixed a bug where components explicitly listed in include_outputs_from would not appear in the pipeline results if they returned an empty dictionary. Now, any component specified in include_outputs_from will be included in the results regardless of whether its output is empty.

  • Fix the serialization and deserialization of pipeline_outputs in pipeline_snapshot and make it use the same schema as the rest of the pipeline state when running pipelines with breakpoints. The deserialization of the older format of pipeline_outputs without serialization schema is supported till Haystack 2.23.0.

  • Fixed ToolInvoker missing tools after warmup for lazy-initialized toolsets. The invoker now refreshes its tool registry post-warmup, ensuring replaced placeholders (e.g., MCPToolset with eager_connect=False) resolve to the actual tool names at invocation time.

💙 Big thank you to everyone who contributed to this release!

@Amnah199, @anakin87, @davidsbatista, @dfokina, @mrchtr, @OscarPindaro, @schwartzadev, @sjrl, @TaMaN2031A, @vblagoje, @YassineGabsi, @ZeJ0hn

v2.21.0-rc1

03 Dec 20:33

Choose a tag to compare

v2.21.0-rc1 Pre-release
Pre-release
v2.21.0-rc1

v2.20.0

13 Nov 15:06

Choose a tag to compare

⭐️ Highlights

Support for OpenAI's Responses API

Haystack now integrates the OpenAI's Responses API through the new OpenAIResponsesChatGenerator and AzureOpenAIResponsesChatGenerator components.

This unlocks several advanced capabilities like:

  • Retrieving concise summaries of the model’s reasoning process.
  • Using native OpenAI or MCP tool formats alongside Haystack Tool objects and Toolset instances.

Example with reasoning and a web search tool:

from haystack.components.generators.chat import AzureOpenAIResponsesChatGenerator
from haystack.dataclasses import ChatMessage

# with `OpenAIResponsesChatGenerator`
chat_generator = OpenAIResponsesChatGenerator(
    model="o3-mini",
    generation_kwargs={"summary": "auto", "effort": "low"},
    tools=[{"type": "web_search"}],
)
response = chat_generator.run(messages=[ChatMessage.from_user("What's a positive news story from today?")])

# with `AzureOpenAIResponsesChatGenerator`
chat_generator = AzureOpenAIResponsesChatGenerator(
    azure_endpoint="https://example-resource.azure.openai.com/",
    azure_deployment="gpt-5-mini",
    generation_kwargs={"reasoning": {"effort": "low", "summary": "auto"}},
)
response = chat_generator.run(messages=[ChatMessage.from_user("What's Natural Language Processing?")])

print(response["replies"][0].text)

🚀 New Features

  • Added the AzureOpenAIResponsesChatGenerator, a new component that integrates Azure OpenAI's Responses API into Haystack.
  • Added the OpenAIResponsesChatGenerator, a new component that integrates OpenAI's Responses API into Haystack.
  • If logprobs are enabled in the generation kwargs, return logprobs in ChatMessage.meta for OpenAIChatGenerator and OpenAIResponsesChatGenerator.
  • Added an extra field to ToolCall and ToolCallDelta to store provider-specific information.
  • Updated serialization and deserialization of PipelineSnapshots to work with pydantic BaseModels.
  • Added async support to SentenceWindowRetriever with a new run_async() method, allowing the retriever to be used in async pipelines and workflows.
  • Added warm_up() method to all ChatGenerator components (OpenAIChatGenerator, AzureOpenAIChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, and FallbackChatGenerator) to properly initialize tools that require warm-up before pipeline execution. The warm_up() method is idempotent and follows the same pattern used in Agent and ToolInvoker components. This enables proper tool initialization in pipelines that use ChatGenerators with tools but without an Agent component.
  • The AnswerBuilder component now exposes a new parameter return_only_referenced_documents (default: True) that controls if only documents referenced in the replies are returned. Returned documents include two new fields in the meta dictionary:
    • source_index: the 1-based index of the document in the input list
    • referenced: a boolean value indicating if the document was referenced in the replies (only present if the reference_pattern parameter is provided).
      These additions make it easier to display references and other sources within a RAG pipeline.

⚡️ Enhancement Notes

  • Adds generation_kwargs to the Agent component, allowing for more fine-grained control at run-time over chat generation.
  • Added a revision parameter to all Sentence Transformers embedder components (SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder, SentenceTransformersSparseDocumentEmbedder, and SentenceTransformersSparseTextEmbedder) to allow users to specify a specific model revision/version from the Hugging Face Hub. This enables pinning to a particular model version for reproducibility and stability.
  • Updated the components Agent, LLMMetadataExtractor, LLMMessagesRouter, and LLMDocumentContentExtractor to automatically call self.warm_up() at runtime if they have not been warmed up yet. This ensures that the components are ready for use without requiring an explicit warm-up call. This differs from previous behavior where warm-up had to be manually invoked before use, otherwise a RuntimeError was raised.
  • Improved log-trace correlation for DatadogTracer by using the official ddtrace.tracer.get_log_correlation_context() method.
  • Improved Toolset warm-up architecture for better encapsulation. The base Toolset.warm_up() method now warms up all tools by default, while subclasses can override it to customize initialization (e.g., setting up shared resources instead of warming individual tools). The warm_up_tools() utility function has been simplified to delegate to Toolset.warm_up().

🐛 Bug Fixes

  • Fixed deserialization of state schema when it is None in Agent.from_dict.

  • Fixed a bug where components explicitly listed in include_outputs_from would not appear in the pipeline results if they returned an empty dictionary. Now, any component specified in include_outputs_from will be included in the results regardless of whether its output is empty.

  • Fixed type compatibility issue where passing list[Tool] to components with a tools parameter (such as ToolInvoker) caused static type checker errors.
    In version 2.19, the ToolsType was changed to Union[list[Union[Tool, Toolset]], Toolset] to support mixing Tools and Toolsets. However, due to Python's list invariance, list[Tool] was no longer considered compatible with list[Union[Tool, Toolset]], breaking type checking for the common pattern of passing a list of Tool objects.

    The fix explicitly lists all valid type combinations in ToolsType: Union[list[Tool], list[Toolset], list[Union[Tool, Toolset]], Toolset]. This preserves backward compatibility for existing code while still supporting the new functionality of mixing Tools and Toolsets.

    Users who encountered type errors like "Argument of type 'list[Tool]' cannot be assigned to parameter 'tools'" should no longer see these errors after upgrading. No code changes are required on the user side.

  • When creating a pipeline snapshot, we now ensure use of _deepcopy_with_exceptions when copying component inputs to avoid deep copies of items like components and tools since they often contain attributes that are not deep-copyable.
    For example, the LinkContentFetcher has httpx.Client as an attribute, which throws an error if deep-copied.

💙 Big thank you to everyone who contributed to this release!

@Amnah199, @anakin87, @cmnemoi, @davidsbatista, @dfokina, @HamidOna, @Hansehart, @jdb78, @mrchtr, @sjrl, @swapniel99, @TaMaN2031A, @tstadel, @vblagoje

v2.20.0-rc2

13 Nov 10:55

Choose a tag to compare

v2.20.0-rc2 Pre-release
Pre-release
v2.20.0-rc2

v2.20.0-rc1

11 Nov 14:59

Choose a tag to compare

v2.20.0-rc1 Pre-release
Pre-release
v2.20.0-rc1

v2.19.0

20 Oct 12:53

Choose a tag to compare

⭐️ Highlights

🛡️ Try Multiple LLMs with FallbackChatGenerator

Introduced FallbackChatGenerator, a resilient chat generator that runs multiple LLMs sequentially and automatically falls back when one fails. It tries each generator in order until one succeeds, handling errors like timeouts, rate limits, or server issues. Ideal for building robust, production-grade chat systems that stay responsive across providers.

from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.google_genai import GoogleGenAIChatGenerator
from haystack_integrations.components.generators.anthropic import AnthropicChatGenerator
from haystack.components.generators.chat.openai import OpenAIChatGenerator
from haystack.components.generators.chat.fallback import FallbackChatGenerator

anthropic_generator = AnthropicChatGenerator(model="claude-sonnet-4-5", timeout=1) # force failure with low timeout
google_generator = GoogleGenAIChatGenerator(model="gemini-2.5-flashy") # force failure with typo in model name
openai_generator = OpenAIChatGenerator(model="gpt-4o-mini") # success

chat_generator = FallbackChatGenerator(chat_generators=[anthropic_generator, google_generator, openai_generator])
response = chat_generator.run(messages=[ChatMessage.from_user("What is the plot twist in Shawshank Redemption?")])

print("Successful ChatGenerator: ", response["meta"]["successful_chat_generator_class"])
print("Response: ", response["replies"][0].text)

Output:

WARNING:haystack.components.generators.chat.fallback:ChatGenerator AnthropicChatGenerator failed with error: Request timed out or interrupted...
WARNING:haystack.components.generators.chat.fallback:ChatGenerator GoogleGenAIChatGenerator failed with error: Error in Google Gen AI chat generation: 404 NOT_FOUND...
Successful ChatGenerator:   OpenAIChatGenerator
Response:  In "The Shawshank Redemption," ....

🛠️ Mix Tool and Toolset in Agents

You can now combine both Tool and Toolset objects in the same tools list for Agent and ToolInvoker components. This update brings more flexibility, letting you organize tools into logical groups while still adding standalone tools in one go.

from haystack.components.agents import Agent
from haystack.tools import Tool, Toolset

math_toolset = Toolset([add_tool, multiply_tool])
weather_toolset = Toolset([weather_tool, forecast_tool])

agent = Agent(
    chat_generator=generator,
    tools=[math_toolset, weather_toolset, calendar_tool],  # ✨ Now supported!
)

⚙️ Faster Agents with Tool Warmup

Tool and Toolset objects can now perform initialization during Agent or ToolInvoker warmup. This allows setup tasks such as connecting to databases, loading models, or initializing connection pools before the first use.

from haystack.tools import Toolset
from haystack.components.agents import Agent

# Custom toolset with initialization needs
class DatabaseToolset(Toolset):
    def __init__(self, connection_string):
        self.connection_string = connection_string
        self.pool = None
        super().__init__([query_tool, update_tool])
        
    def warm_up(self):
        # Initialize connection pool
        self.pool = create_connection_pool(self.connection_string)

🚀 New Features

  • Updated our serialization and deserialization of PipelineSnapshots to work with python Enum classes.

  • Added FallbackChatGenerator that automatically retries different chat generators and returns first successful response with detailed information about which providers were tried.

  • Added pipeline_snapshot and pipeline_snapshot_file_path parameters to BreakpointException to provide more context when a pipeline breakpoint is triggered.
    Added pipeline_snapshot_file_path parameter to PipelineRuntimeError to include a reference to the stored pipeline snapshot so it can be easily found.

  • A new component RegexTextExtractor which allows to extract text from chat messages or strings input based on custom regex pattern.

  • CSVToDocument: add conversion_mode='row' with optional content_column; each row becomes a Document; remaining columns stored in meta; default 'file' mode preserved.

  • Added the ability to resume an Agent from an AgentSnapshot while specifying a new breakpoint in the same run call. This allows stepwise debugging and precise control over chat generator inputs tool inputs before execution, improving flexibility when inspecting intermediate states. This addresses a previous limitation where passing both a snapshot and a breakpoint simultaneously would throw an exception.

  • Introduce SentenceTransformersSparseTextEmbedder and SentenceTransformersSparseDocumentEmbedder components. These components embed text and documents using sparse embedding models compatible with Sentence Transformers. Sparse embeddings are interpretable, efficient when used with inverted indexes, combine classic information retrieval with neural models, and are complementary to dense embeddings. Currently, the produced SparseEmbedding objects are compatible with the QdrantDocumentStore.

    Usage example:

    from haystack.components.embedders import SentenceTransformersSparseTextEmbedder
    
    text_embedder = SentenceTransformersSparseTextEmbedder()
    text_embedder.warm_up()
    
    print(text_embedder.run("I love pizza!"))
    # {'sparse_embedding': SparseEmbedding(indices=[999, 1045, ...], values=[0.918, 0.867, ...])}
  • Added a warm_up() function to the Tool dataclass, allowing tools to perform resource-intensive initialization before execution. Tools and Toolsets can now override the warm_up() method to establish connections to remote services, load models, or perform other preparatory operations. The ToolInvoker and Agent automatically call warm_up() on their tools during their own warm-up phase, ensuring tools are ready before use.

  • Fixed a serialization issue related to function objects in a pipeline; now they are converted to type None (functions cannot be serialized). This was preventing the successful setting of breakpoints in agents and their use as a resume point. If an error occurs during an Agent execution, for instance, during tool calling. In that case, a snapshot of the last successful step is raised, allowing the caller to catch it to inspect the possible reason for the crash and use it to resume the pipeline execution from that point onwards.

⚡️ Enhancement Notes

  • Added tools to agent run parameters to enhance the agent's flexibility. Users can now choose a subset of tools for the agent at runtime by providing a list of tool names, or supply an entirely new set by passing Tool objects or a Toolset.
  • Enhanced the tools parameter across all tool-accepting components (Agent, ToolInvoker, OpenAIChatGenerator, AzureOpenAIChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator) to accept either a mixed list of Tool and Toolset objects or just a Toolset object. Previously, components required either a list of Tool objects OR a single Toolset, but not both in the same list. Now users can organize tools into logical Toolsets while also including standalone Tool objects, providing greater flexibility in tool organization. For example: Agent(chat_generator=generator, tools=[math_toolset, weather_toolset, standalone_tool]). This change is fully backward compatible and preserves structure during serialization/deserialization, enabling proper round-trip support for mixed tool configurations.
  • Refactored _save_pipeline_snapshot to consolidate try-except logic and added a raise_on_failure option to control whether save failures raise an exception or are logged. _create_pipeline_snapshot now wraps _serialize_value_with_schema in try-except blocks to prevent failures from non-serializable pipeline inputs.

🐛 Bug Fixes

  • Fix Agent run_async method to correctly handle async streaming callbacks. This previously triggered errors due to a bug.
  • Prevent duplication of the last assistant message in the chat history when initializing from an AgentSnapshot.
  • We were setting response_format to None in OpenAIChatGenerator by default which doesn't follow the API spec. We now omit the variable if response_format is not passed by the user.
  • Ensure that the OpenAIChatGenerator is properly serialized when response_format in generation_kwargs is provided as a dictionary (for example, {"type": "json_object"}). Previously, this caused serialization errors.
  • Fixed parameter schema generation in ComponentTool when using inputs_from_state. Previously, parameters were only removed from the schema if the state key and parameter name matched exactly. For example, inputs_from_state={"text": "text"} removed text as expected, but inputs_from_state={"state_text": "text"} did not. This is now resolved, and such cases work as intended.
  • Refactored SentenceTransformersEmbeddingBackend to ensure unique embedding IDs by incorporating all relevant arguments.
  • Fixed Agent to correctly raise a BreakpointException when a ToolBreakpoint with a specific tool_name is provided in an assistant chat message containing multiple tool calls.
  • The OpenAIChatGenerator implementation uses ChatCompletionMessageCustomToolCall, which is only available in OpenAI client >=1.99.2. We now require openai>=1.99.2.

💙 Big thank you to everyone who contributed to this release!

@anakin87, @bilgeyucel, @davidsbatista, @dfokina, @...

Read more

v2.19.0-rc1

20 Oct 10:37

Choose a tag to compare

v2.19.0-rc1 Pre-release
Pre-release
v2.19.0-rc1

v2.18.1

29 Sep 09:43

Choose a tag to compare

Release Notes

v2.18.1

⚡️ Enhancement Notes

  • Added tools to agent run parameters to enhance the agent's flexibility. Users can now choose a subset of tools for the agent at runtime by providing a list of tool names, or supply an entirely new set by passing Tool objects or a Toolset.

🐛 Bug Fixes

  • Fix Agent run_async method to correctly handle async streaming callbacks. This previously triggered errors due to a bug.
  • Prevent duplication of the last assistant message in the chat history when initializing from an AgentSnapshot.
  • We were setting response_format to None in OpenAIChatGenerator by default which doesn't follow the API spec. We now omit the variable if response_format is not passed by the user.