Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Vertex AI RAG Engine overview

This page describes what Vertex AI RAG Engine is and how it works.

Description	Console
To learn how to use the Vertex AI SDK to run Vertex AI RAG Engine tasks, see the RAG quickstart for Python.	Try Vertex AI RAG Engine

How Vertex AI RAG Engine works

Vertex AI RAG Engine, a component of the Vertex AI Platform, is a data framework for developing applications that use Retrieval-Augmented Generation (RAG). RAG augments the context of a large language model (LLM) with your own data.

A common challenge with LLMs is that they can't access private knowledge, such as your organization's data. With Vertex AI RAG Engine, you can enrich the LLM's context with your private information. This process helps the model reduce hallucination and answer questions more accurately.

Combining your knowledge sources with an LLM's existing knowledge provides the model with better context. The improved context, along with the user's query, enhances the quality of the LLM's response. For example, to answer a question about a company's internal policy, a RAG system first retrieves the relevant policy document and then uses an LLM to generate an answer based on that document.

The following image illustrates the key concepts of the RAG process in Vertex AI RAG Engine.

Vertex AI RAG key
concepts

The RAG process includes the following steps:

Data ingestion: Ingests data from various sources, such as local files, Cloud Storage, and Google Drive.
Data transformation: Transforms data in preparation for indexing, for example, by splitting it into chunks.
Embedding: Converts text into numerical representations (embeddings) that capture semantic meaning. Text with similar meanings has similar embeddings.
Data indexing: Creates an index, called a corpus, to structure the knowledge base for optimized searching.
Retrieval: Searches the indexed knowledge base to find information relevant to a user's query or prompt.
Generation: The retrieved information becomes the context added to the original user query as a guide for the generative AI model to generate factually grounded and relevant responses.

Supported regions

Vertex AI RAG Engine is supported in the following regions:

Region	Location	Description	Launch stage
`us-central1`	Iowa	`v1` and `v1beta1` versions are supported.	Allowlist
`us-east4`	Virginia	`v1` and `v1beta1` versions are supported.	GA
`europe-west3`	Frankfurt, Germany	`v1` and `v1beta1` versions are supported.	GA
`europe-west4`	Eemshaven, Netherlands	`v1` and `v1beta1` versions are supported.	GA

Access to us-central1 requires you to be on an allowlist. To experiment with Vertex AI RAG Engine, you can use other available regions. If you need to use us-central1 for production traffic, contact vertex-ai-rag-engine-support@google.com to request access.

Submit feedback

To chat with Google support, go to the Vertex AI RAG Engine support group.

To send an email, use the email address vertex-ai-rag-engine-support@google.com.

What's next

To learn how to use the Vertex AI SDK to run Vertex AI RAG Engine tasks, see RAG quickstart for Python.
To learn about grounding, see Grounding overview.
To learn more about the responses from RAG, see Retrieval and Generation Output of Vertex AI RAG Engine.
To learn about the RAG architecture:
- Infrastructure for a RAG-capable generative AI application using Vertex AI and Vector Search
- Infrastructure for a RAG-capable generative AI application using Vertex AI and AlloyDB for PostgreSQL.