MongoDB Blog
Announcements, updates, news, and more
Streamlining Editorial Operations with Gen AI and MongoDB
Are you overwhelmed by the sheer volume of information and the constant pressure to produce content that truly resonates? Audiences constantly demand engaging and timely topics. As the daily influx of information grows massively, it’s becoming increasingly tough to identify what’s interesting and relevant. Consequently, teams are spending more time researching trends, verifying sources, and managing tools than actually creating compelling stories. This is where artificial intelligence enters the media landscape to offer newer possibilities. Tapping into AI capabilities calls for a flexible data infrastructure in order to streamline content workflows, provide real-time insights, and help teams stay focused on what matters most. In this blog, we will explore how combining gen AI with modern databases, such as MongoDB, can efficiently improve editorial operations. Why are your content ideas running dry? Creative fatigue significantly impacts content production. Content leads face constant pressure to generate fresh ideas under tight deadlines, leading to creative blocks. In fact, a recent report from Hubspot, 16% of content marketers struggle with finding compelling new content ideas . This pressure often compromises work quality due to time constraints, leaving little room for delivering authentic content. Another main hurdle is identifying credible and trending topics quickly. In order to find reliable pieces of information, a lot of time is spent on researching and discovery rather than actual creation. This leads to missed opportunities in identifying what’s trending and reduces the audience engagement as well. This presents a clear opportunity for AI, leveraged with modern databases, to deliver a transformative solution. Using MongoDB to streamline content operations MongoDB provides a flexible, unified storage solution through its collections for modern editorial workflows. The need for a flexible data infrastructure Developing an AI-driven publishing tool necessitates a system that can ingest, process, and structure a high volume of diverse content from multiple sources.. Traditional databases often struggle with this complexity. Such a system demands the ability to ingest data from many sources, dynamically categorize content by industry, and perform advanced AI-enabled searches to scale applications. Combining flexible document-oriented databases with embedding techniques transforms varied content into structured, easily retrievable insights. Figure 1 below illustrates this integrated workflow, from raw data ingestion to semantic retrieval and AI-driven topic suggestions. Figure 1. High-level architectural diagram of the Content Lab solution, showing the flow from the front-end through microservices, backend services, and MongoDB Atlas to AI-driven topic suggestions. Raw data into actionable insights We store a diverse mix of unstructured and semi-structured content in dedicated MongoDB collections such as news, Reddit posts, suggestions, userProfiles, and drafts, organized by topic, vertical (e.g., business, health), and source metadata for efficient retrieval and categorization. These collections are continuously updated from external APIs like NewsAPI and Reddit, alongside AI services (e.g., AWS Bedrock, Anthropic Claude) integrated via backend endpoints. By leveraging embedding models, we transform raw content into organised, meaningful data, stored in their specific categories (e.g., business, health) in the form of vectors. MongoDB Atlas Vector Search and Aggregation Pipeline enables fast semantic retrieval, allowing users to query abstract ideas or keywords and get back the most relevant, trending topics ranked by a similarity score. Generative AI services then draw upon these results to automate the early stages of content development, suggesting topics and drafting initial articles to substantially reduce creative fatigue. From a blank page to first draft – With gen AI and MongoDB Once a user chooses a topic, they’re taken to a draft page, as depicted in the third step of Figure 2. Users are then guided by a large language model (LLM)-based writing assistant and supported by Tavily’s search agent, which pulls in additional contextual information. MongoDB continues to handle all associated metadata and draft state, ensuring the user’s entire journey stays connected and fast. Figure 2. Customer flow pipeline & behind-the-scenes. We also maintain a dedicated userProfiles collection, linked to both the drafts and chatbot systems. This enables dynamic personalization so, for example, a Gen Z user receives writing suggestions aligned with their tone and preferences. This level of contextual adaptation improves user engagement and supports editorial consistency. User-generated drafts are stored as new entries in a dedicated drafts collection. This facilitates persistent storage, version control, and later reuse which is essential for editorial workflows. MongoDB’s flexible schema lets us evolve the data model as we add new content types or fields without migrating data. Solving the content credibility challenge Robust data management directly addresses the content credibility. When we generate topic suggestions, we capture and store the source URLs within MongoDB, embedding these links directly into the suggestion cards shown in the UI. This allows users to quickly verify each topic’s origin and reliability. Additionally, by integrating Tavily, we retrieve related contextual information along with their URLs, further enriching each suggestion. MongoDB’s efficient handling of complex metadata and relational data ensures that editorial teams can consistently and confidently vet content sources, delivering trustworthy, high-quality drafts. By combining Atlas Vector Search, flexible collections, and real-time queries, MongoDB assists greatly in building an end-to-end content system that’s agile, adaptable and intelligent. The next section shows how this translates into a working editorial experience. From raw ideas to ready stories: Our system in action With our current solution, the editorial teams can rapidly transition from scattered ideas to structured, AI-assisted drafts, all within a smart, connected system. The combination of generative AI, semantic search, and flexible data handling enables the workflow to become faster, more spontaneous and less dependent on manual effort. Consequently, the system focuses back on creativity as it becomes convenient to discover relevant topics from verified sources and produce personalised drafts. Adaptability and scalability become the essential factors in developing intelligent systems that can produce great results within the content scope. As editorial demands grow constantly, it necessitates an infrastructure that can ingest diverse data, produce insights, and assist in real-time collaboration. This system illustrates how AI coupled with a flexible, document-oriented backend can assist teams to reduce fatigue, enhance quality and accelerate the production without increasing difficulty. It’s not just about automation; it’s about providing a more focused, efficient, and reliable path from idea to publication. Here are a few next steps to help you explore the tools and techniques behind AI-powered editorial systems: Dive Deeper with Atlas Vector Search : Explore our comprehensive tutorial to understand how Atlas Vector Search empowers semantic search and enables real-time insights from your data. Discover Real-World Applications: Learn more about how MongoDB is transforming media operations by reading the AI-Powered Media article. Check out the MongoDB for Media and Entertainment page to learn more about how we meet the dynamic needs of modern media workflows.
Driving Airport Efficiency with MongoDB and Dataworkz
In 2024, airports worldwide supported over 40 million flights. 1 What these millions of flights translate to is intense activity on the ground, where in each and every flight, passengers, and pieces of cargo rely on a complex network of supporting operations. These include baggage and cargo handling, aircraft towing, refueling, catering, maintenance, and the coordination of various air-side vehicles. A single aircraft turnaround can require a ground operations team of roughly 20 people. There is also the issue of less experienced staff often brought in to meet demand especially during peak travel seasons where operations move at an intense pace. In such conditions, the likelihood of human error increases, threatening the safety of personnel, the security of multi-million-dollar aircraft and support vehicles, and the overall efficiency of airport operations causing flight delays. Flight delays remain one of the airline industry’s most persistent challenges. Approximately 30,000 flights are delayed each day, with each delay averaging 17.3 minutes . Not only is this a major inconvenience to passengers, but the operational impact is severe for the airline. For a widely used aircraft like the Airbus A321, a 15-minute delay during at-gate, taxiing, or en-route operations can cost airlines around €3,030 per flight (or roughly $3500). Across thousands of flights, the financial losses and cascading schedule disruptions are enormous. We can address this issue via a data-driven transformation of ground operations. This blog post explores a smart airport operations application, powered by MongoDB Atlas and Dataworkz , that can optimize processes and prevent costly delays. Prevention is critical, particularly in environments where rising flight volumes are met with accelerated recruitment and limited training time. The solution leverages Google Cloud and MongoDB Atlas to power an agentic voice assistant that guides operators through checklists, retrieves real-time answers from embedded manuals via Dataworkz ’s powered retrieval-augmented generation (RAG) application, and logs every action for audit and optimization. Using Vertex AI for speech-to-text and text-to-speech capabilities and natural language processing, Atlas Vector Search for context-aware retrieval, and a flexible BSON schema for structured and unstructured data, the system enables natural and hands-free interaction that ensures compliance and increases operational efficiency. Solution design MongoDB provides a flexible, secure, and scalable database with high performance, making it perfectly suited for AI applications. Its document model allows for the storage of diverse and unstructured data, which is common in AI workloads containing contextualized-relevant outputs derived from integrated data, monitoring, and log files. MongoDB’s horizontal scalability ensures that large volumes of ground ops data are handled seamlessly, while its robust aggregation framework simplifies data manipulation for AI applications development. Dataworkz serves as a managed RAG platform built for gen AI applications, providing an agentic AI framework and an AI-optimized data layer (ODL) foundation, enabling a comprehensive view through seamless data integration into MongoDB Atlas. This permits users to configure their sources and destinations by bringing together said data, processing, and performing AI/ML. The Dataworkz platform also facilitates monitoring with audit capabilities through managed authentication and management of the application's lifecycle for production readiness, integrating various data sources via pre-built connectors. In the solution, Dataworkz manages the end-to-end RAG workflow, from ingesting and embedding technical manuals and regulations using Voyage AI models, to executing real-time, context-aware queries with MongoDB Atlas Vector Search for highly accurate information retrieval. Dataworkz’s RAG builder—with MongoDB as its core data infrastructure—empowers the voice assistant to seamlessly integrate the vast (and often complex) information contained within aircraft and airline safety manuals, transforming it into an easily accessible and actionable resource for the ground crew. Figure 1 illustrates an architecture that integrates Google Cloud and MongoDB Atlas to enable an agentic voice assistant for operational checklists. MongoDB Atlas manages agent memory, including checklists, steps, and RAG-generated logs, while Google Cloud powers speech interfaces and large language model (LLM) inference via Vertex AI. Figure 1. Smart airport ground operations solution architecture. This solution addresses key requirements for improved operational efficiency. The following points outline in detail its core functionalities, benefits, and interdependencies through an adherence to regulatory framework needs: Real-time information retrieval built with Dataworkz and powered by MongoDB Atlas Vector Search : When an operator asks a question (e.g., "Where can I find the Auxiliary Power Unit (APU)?"), the voice assistant doesn't just rely on pre-programmed responses. Instead, Dataworkz's RAG application intelligently queries the relevant sections of the embedded manuals and regulations in real-time, delivering context-specific answers. This process heavily leverages MongoDB Atlas Vector Search capabilities. When a user's question in Natural Language is sent to the Dataworkz RAG app, Atlas Vector Search then efficiently compares it to vectorized representations of the manual content, which are stored in a dedicated MongoDB collection tracked as logs from the operation session. This allows for a fast, contextually relevant retrieval of the most accurate and up-to-date information. Natural language processing (NLP): Dataworkz employs advanced NLP capabilities to process both spoken and text-based queries from operators handled by Vertex AI. Additionally, it processes technical manuals, regulations, and other airline documentation using VoyageAI embedding models. MongoDB stores both the raw and vectorized text data from the manuals, streamlining operations. This data is readily accessible to be retrieved by the assistant to provide a context-aware response to the end user. This comprehensive approach enables a natural, conversational interaction, eliminating the need for operators to use rigid commands or specific technical jargon, ensuring a seamless user experience. Checklists and MongoDB for audit trails: When an operator initiates a voice-assisted checklist, the application dynamically feeds the checklist items and associated procedural details directly to the voice assistant. As the operator confirms completion of each step, the system can provide instant validation, read next steps, or offer additional context from the manuals if a discrepancy is detected. Crucially, every aspect of session activity is meticulously logged and stored in MongoDB Atlas. Each session is represented as a single JSON document, providing a flexible schema structure to capture nested data. This includes not only the precise completion times of each session but also the session-specific data from checklists, represented as nested arrays of objects. Each object within the array details the completion status of every individual step from the selected checklist operation, session ID, and timestamp. This ensures a complete and accurate historical record of all operations, providing crucial insights for analysis, auditing, and continuous process optimization, both cost and safety-wise. Data-driven insights via MongoDB: By providing immediate access to comprehensive and contextualized information, the solution can significantly reduce the training time and cognitive load for ground crews, by no longer needing to memorize every detail of every manual but can rely on the voice assistant as an intelligent, on-demand knowledge base delivering context-aware answers in real time. Additionally, detailed logs can also be used to identify areas where additional training might be beneficial or where manual content could be enhanced, enabling continuous optimization through data-driven insights. Error prevention and compliance: The compliance of the solution steps transforms situations where hands are occupied into easily resolvable issues, minimizing possible risks of human error stemming from misremembered procedures or difficulty in locating information. In the aviation sector, delays caused by ground operations quickly lead to significant financial strain on airlines. By combining MongoDB and Dataworkz, airport operations can be made faster, safer and more efficient. MongoDB’s flexible document model, real-time processing, and advanced vector search capability gives AI application developers the ability to provide accurate, voice-assisted guidance on demand, exactly when and where it’s needed. Figure 2. The aircraft ground operations demo application in action. Replicate this solution in your own environment following the step-by-step guide in our repository . To learn more about MongoDB’s role in the automotive industry, please visit our manufacturing and automotive page. Or, build your first Agentic workflows with Dataworkz. 1 IATA (February 26th, 2025) “IATA Releases 2024 Safety Report” Press Release No: 7. Montreal, Canada. Recovered from: https://www.iata.org/en/pressroom/2025-releases/2025-02-26-01/
New Benchmark Tests Reveal Key Vector Search Performance Factors
Search at scale is challenging. As powerful as vector search is, it can be tough to know how to properly weigh key factors like accuracy, cost, and throughput for larger workloads. We recently released the MongoDB Benchmark for Atlas Vector Search , which outlines crucial performance optimization strategies for vector search, providing a comprehensive guide to achieving optimal results with large-scale datasets. The primary goal of our guide is to significantly reduce friction for your first vector test at scale (>10M vectors) when evaluating performance for Atlas Vector Search. With this new guide, our aim is to provide more context around how to use the benchmark, to explore the dataset (including factors considered), and to summarize and contextualize the results. Let’s take a closer look! A note on benchmarking data Every good presentation includes the requisite safe harbor slide, and the art and science of benchmarking is no different. Embarking on a large-scale vector workload can present significant hurdles stemming from a lack of accurate information and the inherent friction of initial benchmarks. Furthermore, the landscape of vector search and embedding models is rapidly evolving, and information can become outdated quickly, leading users down inefficient or incorrect paths. Without clear, up-to-date guidance, users can struggle to predict system behavior, optimize configurations, and confidently allocate resources. It’s also worth noting that numerous factors (quantization, dimensionality, filtering, search node configuration, concurrency, sharding, and more) interact in complex ways. Understanding these interactions and their specific impact on a particular workload requires deep, accurate insights. Without this, users might optimize one aspect only to inadvertently degrade another. This informational vacuum—coupled with the considerable setup overhead, complex parameter tuning, and the cost of experimentation involved in running the first benchmark—creates a substantial barrier to proving out and scaling a solution. Nonetheless, we feel that these benchmarks provide confidence in POCs for our customers and give them a starting point to work with (as opposed to having no compass to start with). With these factors in mind, let's jump into an overview of the dataset. A look at the dataset The core of this performance analysis revolves around tests conducted on subsets of the Amazon Reviews 2023 dataset, which contained 48M item descriptions across 33 product categories. The dataset was chosen due to the ability to provide a realistic, large-scale e-commerce scenario, as well as offering rich data, including user reviews (ratings, text, helpfulness votes), item metadata (price, images), and detailed item names and descriptions, which are ideal to search over. For the variable dimension tests, subsets of 5.5 million items were used, embedded with voyage-3-large to produce 2048-dimensional vectors. Views were then created to slice these into 1024, 512, and 256-dimensional vectors for testing different dimensionalities. For the large-scale, high-dimensional test, a 15.3 million-item subset—also embedded with 2048-dimensional vectors from voyage-3-large —was used. One of the key takeaways from the report is that at the highest dimensionality (15.3M vectors using voyage-3-large embeddings at 2048 dimensions), Atlas Vector Search with scalar or binary quantization configured retains 90–95% accuracy with less than 50ms of query latency. One item of note is that binary quantization can have higher latency when the number of candidates requested is in the hundreds due to the additional cost of rescoring with full-fidelity vectors, but still might be preferable for many large scale workloads due to cost effectiveness. Figure 1. Binary versus scalar quantization performance. Methodology: Benchmarking with the Amazon reviews dataset Now that we talked a little bit about the data itself and the information included, let’s outline some of the key factors that impact performance for Atlas Vector Search, and how we configured our benchmark to test them. It's also important to acknowledge why these variables are critical: Not every customer will be optimizing their search for the same thing. With that in mind, we will also attempt to identify the interplay and trade-offs between them. While this list is not exhaustive ( see the full report for more details), let’s review some of the key performance factors: Recall : Recall (a measure of search accuracy) is significantly impacted by quantization and vector dimensionality. The report highlights that while scalar quantization generally starts with higher recall, binary quantization can approach similar accuracy levels by increasing numCandidates, though this often incurs higher latency due to an additional rescoring step. Furthermore, higher-dimensional vectors (1024d and 2048d) consistently maintain better recall, especially with larger datasets and quantization, compared to lower dimensions (256d and 512d), which struggle to exceed 70-80% recall. Sizing and cost : The table in the benchmark details the resources required (RAM, storage) and associated costs for different search node tiers based on three different test cases involving varying dataset sizes, vector dimensions, and quantization methods (scalar or binary). The guide provides an example of a sample dataset noting the resource requirements scale linearly, noting how quantization reduces memory requirements substantially. Concurrency and throughput : Throughput is evaluated with multiple requests issued concurrently. Scalar quantization generally achieves higher queries per second (QPS) across various limit values due to less work per query and no rescoring. Concurrency bottlenecks are often observed, indicating that higher latency can occur. Scaling out the number of search nodes or increasing available vCPUs is recommended to resolve these bottlenecks and achieve higher QPS. Figure 2. Node tiers for different test cases. Optimizing your vector search performance This benchmark report thoroughly examines the performance of MongoDB Atlas Vector Search across various configurations and large datasets, specifically the Amazon Reviews 2023 dataset. It explores the impact of factors such as quantization (scalar and binary), vector dimensionality, filtering, search node configurations, binData compression, concurrency, and sharding on recall, latency, and throughput. While there is never a “silver bullet” due to everyone’s definition of search “success” being different, we wanted to highlight some of the various levers to consider, and methods to get the most out of your own deployment. Our goal is to provide some key considerations for how to evaluate and improve your own vector search performance, and help you to properly weigh and contextualize the key factors. Ready to optimize your vector search experience? Explore the guide in our documentation . Run it yourself with our GitHub repo .
Powering Long-Term Memory for Agents With LangGraph and MongoDB
We're excited to introduce the MongoDB Store for LangGraph—a powerful integration that brings flexible and scalable long-term memory to AI agents. This new integration between MongoDB and LangGraph , LangChain’s open-source agent orchestration framework, allows agents to remember and build on previous interactions across multiple sessions instead of only retaining memory for the current session. The result is more intelligent, context-aware agentic systems that learn and improve over time. This new integration complements MongoDB’s existing checkpointer integration, which handles short-term memory and persistent conversation history. Together, the MongoDB Store for LangGraph and MongoDB’s support for checkpointers provide a complete solution for building production-ready, memory-enabled agents. The need for agent memory An AI agent is a system designed to take actions or make decisions based on input, often using tools and reasoning to complete tasks. By default, agents don’t retain memory between conversations, which severely constrains what they can accomplish. Agent memory (and memory management) is a computational exocortex for AI agents. It is a dynamic, systematic process that integrates an agent’s large language model (LLM) memory (context window and parametric weights) with a persistent memory management system to encode, store, retrieve, and synthesize knowledge and experiences. Agent memory is typically divided into two main types: short-term memory and long-term memory. In a memory context, LangGraph uses “threads” to represent individual conversations or sessions. Short-term memory, managed through thread-scoped checkpointers that MongoDB supports, maintains context within a given session. While this preserves conversation continuity and manages history, it doesn’t help agents learn continuously from the past across different conversations to adapt and optimize their behavior over time. This is why we introduced the MongoDB Store for LangGraph, enabling your agents to retain memories across conversations through a cross-thread memory store. Figure 1. Short and long-term memory integration between LangGraph and MongoDB. MongoDB Store: Enabling cross-thread long-term memory The new langgraph-store-mongodb package introduces a MongoDBStore class. Available now through PyPI , this production-ready integration provides: Cross-thread persistence: Store and recall information across different conversation sessions and user interactions, allowing agents to build on previous knowledge. Native JSON structure: LangGraph stores long-term memories as JSON documents, which map directly to MongoDB documents. Each memory is organized using namespaces and a key-value structure. This enables the usage of MongoDB’s native and optimized data formats and search capabilities for efficient retrieval. Vector Search capabilities: Leverage MongoDB Atlas Vector Search for semantic memory retrieval based on meaning, not just keyword matching. Asynchronous support: Support for both synchronous and asynchronous operations for high-performance applications. Automatic connection management: Robust connection pooling and error handling to ensure reliability. Optimized TTL indexes: MongoDB’s Time-to-Live (TTL) indexes are integrated with LangGraph’s TTL system, allowing automatic removal of stale or outdated data. This improves retrieval performance, reduces storage costs, and ensures the system "forgets" obsolete memories efficiently. Ready to give your AI agents persistent long-term memory? The langgraph-store-mongodb package is available now: pip install langgraph-store-mongodb The MongoDB Store for LangGraph enables developers to build more powerful agents for different use cases, including: Customer support agents: Build agents that remember customer preferences, past issues, and resolution patterns across multiple support channels. Personal assistant applications: Build agents that learn user habits and preferences to provide increasingly personalized experiences. Enterprise knowledge management: Create agents that accumulate organizational knowledge and can retrieve relevant information semantically. Multi-agent systems: Enable agent teams to share learned experiences and coordinate through persistent memory. Why MongoDB for agent memory? Effective agentic memory requires comprehensive mechanisms for storing, retrieving, updating, and deleting memories. MongoDB Atlas provides a unified database that meets all these complex requirements: Flexible document model: Store complex, nested memories as rich JSON, matching how agents naturally read, organize, and update evolving information. Semantic search: Native vector search enables retrieval by meaning, not just exact matches. State-of-the-art models: Voyage AI provides embedding models and rerankers for cutting-edge memory retrieval. Scalable architecture: Distributed architecture, workload isolation, autoscaling, and automatic sharding capabilities for scaling AI agent memory. Enterprise security: Fine-grained role-based access control (RBAC) allows precise management of both access scope (specific services or databases) and access type (read-only or read-write). MongoDB Atlas and LangChain: A complete solution for AI agent memory Short-term memory provides an agent with immediate context, current conversation state, prior exchanges within that session, or shared memory for coordination in multi-agent systems. The most common form of short-term memory is working memory—an active, temporary context accessible during a session. MongoDB's integration with LangGraph checkpointers supports this by persisting and restoring conversation states. Other short-term memory implementations include semantic caches, such as using MongoDB's semantic cache integration with LangChain , which stores recent prompts and LLM responses for retrieval when similar queries occur. Shared memory is also used in multi-agent systems to provide a common space for coordination and information sharing. Long-term memory serves as the agent’s knowledge base, storing diverse kinds of information for future use. It includes several functional types, each requiring specific storage and retrieval strategies: Episodic memory: captures specific events and interactions, such as conversation history or summaries of key occurrences with metadata (e.g., timestamps, participants). For instance, a customer support agent can use this to recall a user’s past issues and offer personalized responses. Procedural memory: records instructions or rules for recurring tasks. A typical implementation is a social content generator agent that remembers past feedback on writing style and formatting to improve its process. Semantic memory: remembers general knowledge, facts, and concepts. This is often implemented through retrieval-augmented generation (RAG), where data is stored as vector embeddings and retrieved based on semantic similarity. Associative memory: stores key entities and relationships between different pieces of information, enabling an agent to identify patterns and make inferences by navigating these connections. It's often implemented using graph structures that support efficient exploration of relationships. One practical approach is GraphRAG . The MongoDB Store for LangGraph supports these memory types through flexible filtering and semantic search, making it a versatile approach for building reliable long-term memory in agents. LangChain also provides LangMem, a toolkit featuring pre-built tools designed specifically for extracting and managing procedural, episodic, and semantic memories. LangMem integrates natively with LangGraph, streamlining the memory engineering process. For developers seeking a straightforward approach to using various memory types with MongoDB, explore this comprehensive tutorial for implementing MongoDB alongside LangGraph and LangMem . The future of intelligent agents With the new MongoDB Store for LangGraph, we're enabling developers to build AI agents that can learn and adapt. Agents that remember user preferences, learn from mistakes, and build knowledge over time will transform how we interact with AI systems. The combination of LangGraph's sophisticated orchestration capabilities with MongoDB's flexible, scalable storage creates unprecedented opportunities for building intelligent, persistent AI agents that feel truly alive and responsive. Ready to build memory-enabled agents with LangGraph and MongoDB Atlas? Get started with the documentation .
Building an Agentic AI Fleet Management Solution
Artificial intelligence is revolutionizing the manufacturing and motion industry, with AI-powered solutions now capable of delivering precise, real-time insights that can optimize everything from route planning to predictive maintenance. Modern vehicles can generate an overwhelming amount of data—nearly 25 GB per hour, through a diverse range of sensors, according to an article from S&P Global Mobility. Contextualizing this data with user feedback, maintenance records, and technical knowledge becomes increasingly challenging as the system scales. These complexities can create inefficiencies, introduce overhead while processing data, and drive up operational costs, hindering the full potential of AI-driven systems. An efficient fleet management architecture can address these problems by reducing redundancies, optimizing data retrieval processes, and enabling the seamless integration and use of embeddings. MongoDB’s flexible document model fits perfectly to this approach. Unlike legacy SQL databases, MongoDB excels at managing unstructured, semi-structured, and structured data. This capability allows fleet management software to ingest and process diverse data types, including vehicle signal data, geospatial zones, fleet configurations, query logs, route telemetry, maintenance records, and real-time performance scores. In this post, we will use various MongoDB Atlas features—such as geospatial query operations, time-series collections, Atlas Charts, and aggregation pipelines—to create an agentic AI-powered fleet management system. This system demonstrates how an AI agent can enable intelligent data processing, providing real-time, context-aware responses to user queries in a streamlined manner. Fleet management software with AI overview A traditional fleet management system provides features like resource planning, route optimization, and maintenance scheduling which work together to improve Cost Management, Regulatory Compliance, and Overall Operational Effectiveness (OEE). Our solution harnesses the power of MongoDB's flexible document schema, time-series collections, and geospatial query support to give fleet managers the ability to query, filter and operate on data effectively. Additionally, an AI Agent assists users in obtaining actionable insights through a chat-driven interface. Figure 1. Architecture of the solution. The AI agent has a chatbot UI. The data captured by the agent is used to trigger an orchestration service which then calls various tools as required and gets data from MongoDB in order to complete its task. In Figure 1, the telemetry data from our cars is stored in MongoDB in time series collections via microservices. In addition to the telemetry data we store stationary car information (e.g., brand, model, year, VIN, among others) and user configurations, such as past queries and fleet settings. All of this data is leveraged by the agentic system to answer user queries and provide deeper insights for future references to similar queries. Figure 2 shows the user interface of the agentic system where queries can be submitted directly. Filters allow users to narrow results by fleet, time range, or geozone, while the AI Agent delivers answers using real-time and historical data. Figure 2. Demo chat section. When a user inputs a question into the chat box, the AI Agent analyzes it by embedding the query into metadata and searching for similar prior questions in the historical recommendations collection. Depending on the tools required, the system accesses contextual data across collections, such as time-series metrics, geospatial locations, or maintenance logs, through aggregation pipelines. Once the relevant data is assembled, the AI synthesizes the information into actionable insights, providing the user with an accurate and informative response. MongoDB features for a fleet management system RAG framework with MongoDB Vector Search Agents powered by retrieval-augmented generation (RAG) are transforming fleet management systems by seamlessly integrating real-time contextual information during response generation. MongoDB’s flexible NoSQL model complements RAG by embedding fast, low-latency document data. Combined with Voyage AI’s cost-efficient embedding model, MongoDB accelerates vector search workflows for smarter decision-making MongoDB’s Atlas Vector Search empowers the agent to operate proactively by connecting user queries with relevant insights stored in the database. For instance, when a fleet manager asks about the current positions of vehicles, the agent leverages MongoDB’s vector search to match the query against historical recommendations. If similar queries already exist, the agent retrieves pre-existing results instantly, reducing both latency and operational costs. In situations where no matching results are found, the agent complements vector search by invoking LLMs to dynamically generate answers, ensuring fleet managers receive accurate and actionable responses. This streamlined workflow, powered by MongoDB’s unique combination of vector search and flexible data modeling, allows fleet managers to act on real-time, context-aware insights. From analyzing geospatial patterns to addressing systemic vehicle issues, MongoDB enables the agent to simplify complex decision-making while maintaining efficiency. By combining predictive AI capabilities with an optimized, scalable database, this solution transforms fleet management into a more proactive, data-driven process. Polymorphysm MongoDB’s document model allows storing polymorphic data structures within the same collection, meaning documents can vary in structure and embed other documents. This flexibility enables our demo to optimize workflows by storing application-specific metadata tailored to fleet operations. For instance, the historical_recommendations collection stores query and recommendation histories generated by the system’s AI engine, with the ability to embed metadata dynamically, such as the initial question asked, the tool chosen, and the results it got. This enables improved context for future queries by streamlining read operations, and giving more context for our AI agent. For example, a document in this collection might appear as follows: Figure 3. Document model of historical_recommendations. This variability in structure without sacrificing efficiency enables MongoDB to adapt to dynamic data storage requirements inherent in polymorphic workflows. By embedding detailed context and avoiding null values, the system can streamline read operations and provide richer context to the AI agent for future queries. Time series collections MongoDB's time series collections simplify working with time series data. These specialized collections provide several benefits, including automatic creation of compound indexes for faster data retrieval, reduced disk usage, and lower I/O overhead for read operations. This makes time series collections highly efficient for managing time-stamped data, such as a constant stream of sensor data from vehicles in our application. With these capabilities, fleet managers can enable near real-time access to data, empowering AI agents to rapidly extract actionable insights for fleet management. In this demo, MongoDB optimizes query efficiency in our time series collections using its bucketing mechanism. This mechanism groups multiple data points within the same time range into compressed blocks, reducing the number of documents scanned during queries. This results in documents scanned during queries. By grouping multiple data points within the same time range, bucketing minimizes read operations and disk usage, enabling faster range queries and ensuring sustained, optimized cluster performance, even under a humongous load. GeoSpatial queries MongoDB’s native support for geospatial queries enables seamless integration of robust location-based functionalities. The ability to handle complex geographic data is a powerful tool for industries relying on real-time location-based decision-making. In our demo, this capability is leveraged to locate vehicles under various conditions, such as identifying vehicles near or inside a specified geofence, while being able to filter by maximum or minimum distance. Also, geospatial queries can be incorporated directly into aggregation pipelines, enhancing AI-driven workflows powered by our AI Agent. Key takeaways MongoDB enables fleet managers to efficiently gather, process, and analyze data to uncover actionable insights. These capabilities empower managers to optimize operations, enhance vehicle oversight, and implement smarter, data-driven strategies that drive efficiency and performance. Visit MongoDB Atlas to start modernizing your fleet management system. Ready to transform your fleet management operations? Unlock real-time insights, optimize systems, and make smarter decisions with MongoDB’s advanced features. If you're interested in exploring how MongoDB enables intelligent fleet management, check out our Leafy Fleet GitHub repository. Access the Leafy Fleet on GitHub . Additionally, dive deeper into best practices for modeling connected vehicle signal data and learn how MongoDB’s flexible data model simplifies telemetry management at scale. Read the blog post .
Unlock Multi-Agent AI Predictive Maintenance with MongoDB
The manufacturing sector is navigating a growing number of challenges: evolving customer demands, intricate software-mechanical product integrations, just-in-time global supply chains, and a shrinking skilled labor force. Meanwhile, the entire sector is working under intense pressure to improve productivity, manage energy consumption, and keep costs in check. To stay competitive, the industry is undergoing a digital transformation—and data is at the center of that shift. Data-driven manufacturing offers a powerful answer to many of these challenges. On the shop floor, one of the most critical and high-impact applications of these strategies is predictive maintenance. Downtime isn’t just inconvenient—it’s expensive. For example, every unproductive hour in the automotive sector now costs $2.3 million (according to Siemens "The True Cost of Downtime 2024" report). For manufacturers across all sectors, predictive maintenance is no longer optional. It’s a foundational pillar of operational excellence. At its core, predictive maintenance is about using data to anticipate machine failures before they happen. It began with traditional statistical models, evolved with machine learning, and is now entering a new era. As equipment ages and failure behaviors shift, models must adapt. This has led to the adoption of more advanced approaches, including generative AI with retrieval-augmented generation (RAG) capabilities. But the next frontier is multi-agent systems—AI-powered agents working together to monitor, reason, and act. We’ve explored how generative AI powers predictive maintenance in previous posts. In this blog post, we’ll go deeper into multi-agent systems and how MongoDB makes it easy to build and scale them for smart, responsive maintenance strategies. Advance your data-driven manufacturing strategy with Agentic AI AI agents combine large language models (LLMs) with tools, memory, and logic to autonomously handle complex tasks. On the shop floor, this means agents can automate inspections, reoptimize production schedules, assist with fault diagnostics, and more. According to a LangChain survey , 78% of companies are actively developing AI agents, and over half already have at least one agent in production. Manufacturing companies can especially benefit from agentic capabilities across a great variety of practical use cases, as shown in Figure 1. Figure 1. Agent capabilities and related practical use cases in manufacturing. But leveraging AI agents in industrial environments presents unique challenges. Integration with industrial protocols like Modbus or PROFINET is complex. Governance and security requirements are strict, especially when agents interact with production equipment. Latency is also a concern as AI models need fast, reliable data access to support real-time responses. And with agents generating and consuming large volumes of data, companies need a data foundation that is reliable and can scale without sacrificing performance. Many of these challenges are not new to manufacturers—and MongoDB has a proven track record of addressing them. Industry leaders in manufacturing and automotive trust MongoDB to power critical IoT and telemetry use cases. Bosch , for example, uses MongoDB to store, manage, and analyze huge amounts of data to power its Bosch IoT Insights solution. MongoDB’s flexible document model is ideal for diverse sensor inputs and machine telemetry, while allowing systems to iterate and evolve quickly. It’s important to remember that, at its core, MongoDB was built for change, so when it comes to integrating AI in the shopfloor, it’s no surprise that MongoDB is emerging as the ideal data layer foundation. Companies like Novo Nordisk and Cisco rely on MongoDB to build and scale their AI capabilities, and leading platforms like XMPro APEX AI leverage MongoDB Atlas to create and manage advanced AI agents for industrial applications. MongoDB Atlas makes it easy to build AI Agents and operate them at scale. As both a vector and a document database, Atlas supports various search methods for agentic RAG, while also enabling agents to store short and long-term memory in the same database. The result is a unified data layer that bridges industrial IoT and agentic AI. Predictive maintenance is a perfect example of how these capabilities come together to drive real impact on the shop floor. In the next section, we’ll walk through a practical blueprint for building a multi-agent predictive maintenance system using MongoDB Atlas. Building a multi-agent predictive maintenance system This solution demonstrates how to build a multi-agent predictive maintenance system using MongoDB Atlas, LangGraph, and Amazon Bedrock. This system can streamline complex processes, such as detecting equipment anomalies, diagnosing root causes, generating work orders, and scheduling maintenance. At a high level, this solution leverages MongoDB Atlas as the unified data layer. LangGraph provides the orchestration layer, enabling graph-based coordination among agents, while Amazon Bedrock powers the underlying foundational models used by the agents to reason and make decisions. The architecture follows a supervisor-agent pattern. The supervisor coordinates tasks and delegates to three specialized agents: Failure agent , which performs root cause analysis and generates incident reports. Work order agent , which drafts maintenance work orders with detailed requirements. Planning agent , which identifies the optimal time slot for the maintenance task based on availability and production constraints. Figure 2. High-level architecture of a multi-agent predictive maintenance system. This modular design enables the system to scale easily and adapt to different operational needs. Let’s walk through the full process in four key steps. Step 1: Failure prediction kicks off the agentic workflow The process begins with an alert—something unusual in the machine data or logs that could point to a potential failure. MongoDB provides a unified view of operational data, real-time processing capabilities, and seamless compatibility with machine learning tools. Sensor data is processed in real-time using Atlas Stream Processing integrated with ML inference models. Features like native support for Time Series data and Online Archive facilitate managing telemetry data at scale efficiently. All while the downstream applications remain up to date with the latest notifications and dashboards by using Atlas Triggers , Change Streams , and Atlas Charts . From there, the supervisor agent takes over and coordinates the next steps. Figure 3. End-to-end failure prediction process that generates the alerts. Step 2: Leverage your data for root cause analysis The supervisor notifies the Failure Agent about the alert. Manual diagnostics of a machine can take hours—sifting through manuals, historical logs, and environmental data. The AI agent automates this process. It collects relevant documents, retrieves contextual insights using Atlas vector search, and analyzes environmental conditions stored in the database—like temperature or humidity at the time of failure. With this data, the agent performs a root cause analysis and proposes corrective actions. It generates a concise incident report and shares it with the supervisor agent, which then moves the workflow forward. Figure 4. Failure Agent performing root cause analysis. Step 3: Work order process automation The Work Order Agent receives the incident report and drafts a comprehensive maintenance work order. It pulls from previous similar tasks to estimate time requirements, identify the necessary materials, and ensure the right skill sets are listed. All of this is pre-filled into a standardized work order template and saved back into MongoDB Atlas. This step also includes a human-in-the-loop checkpoint. Technicians or supervisors can review and modify the draft before it is finalized. Figure 5 Work Order Agent is generating a draft work order and routing it for human validation. Step 4: Finding the optimal maintenance schedule Once the work order is approved, the Planning Agent steps in. Its task is to schedule the maintenance activity without disrupting production. The agent queries the production calendar, checks staff shift schedules, and verifies inventory availability for required materials. It considers alert severity and rescheduling constraints to find the most efficient time slot. Once the optimal window is identified, the agent sends the updated plan to the scheduling system. Figure 6. Planning Agent is evaluating constraints to identify the optimal maintenance schedule. While we focused on a predictive maintenance work flow, this architecture can be easily extended. Need agents for compliance reporting, spare parts procurement, or shift planning? No problem. With the right foundation, the possibilities are endless. Unlocking manufacturing excellence with Agentic AI Agentic AI represents a new chapter in the evolution of predictive maintenance, enabling manufacturers to move from reactive responses to intelligent, autonomous decision-making. By combining AI agents with real-time telemetry and a unified data foundation, teams can reduce downtime, cut maintenance costs, and boost equipment reliability. But to work at scale, these systems need flexible, high-performance infrastructure. With native support for time series data, vector search, stream processing, and more, MongoDB makes it easier to build, operate, and evolve multi-agent solutions in complex industrial environments. The result is smarter operations, greater resilience, and a clear path to manufacturing excellence. Clone the GitHub repository if you are interested in trying out this solution yourself. To learn more about MongoDB’s role in the manufacturing industry, please visit our manufacturing and automotive webpage .
Boost Connected Car Developments with MongoDB Atlas and AWS
As vehicles continue to evolve from mechanical systems to connected, software-defined platforms, the automotive industry is continuously being reshaped by data. With modern cars generating terabytes of sensor data daily, a key challenge facing the industry is how to extract timely, actionable insight from that data. And a recent survey by McKinsey underscored the degree to which strong connectivity is important to car buyers—close to 40% of US survey respondents indicated that they are willing to switch OEMs over better connectivity options. Though connectivity preferences vary widely by country, autonomous driving and safety features are top of mind for many customers. In such a landscape, OEMs need to offer new innovative use cases on top of the customer data. For example, one of MongoDB’s large automotive clients is combining car telemetry data with engine noise to perform faster diagnostics and maintenance services. Combining car telemetry data and Internet of Things (IoT) infrastructure with generative AI unlocks enormous potential for the auto manufacturers, from predictive maintenance and remote diagnostics to context-aware driver assistance, smart infotainment, and usage-based insurance models. Imagine a vehicle that not only warns of a failing battery but also proactively recommends the nearest certified service center with the right parts in stock. A fleet manager might analyze driving behavior across hundreds of trucks to optimize fuel efficiency and reduce accident risks. And with improved data, manufacturers could aggregate warranty and performance data across regions to detect early signs of systemic issue, responding before small defects become expensive recalls. Insurance providers, meanwhile, might use real-time driving profiles to offer policies tailored to individual habits, replacing static risk models with dynamic pricing. To enable such use cases, organizations require a scalable, flexible and secure data infrastructure. MongoDB Atlas not only offers a flexible document data model but also built-in time series support , high availability , geospatial indexing , and horizontal scalability to handle millions of connected vehicles and associated use cases and services. Combined with AWS services for IoT, edge processing, machine learning, and generative AI, this stack becomes a robust foundation for intelligent mobility. This blog post explores how enterprises can build such a connected car architecture using MongoDB Atlas , Atlas Vector Search , AWS IoT Greengrass, Amazon Bedrock, and LangChain—as shown in Figure 1. We will convert raw automotive telemetry into real-time business value for drivers, technicians, and fleet managers using an example of car maintenance business workflow. Figure 1. Connected vehicle data platform architecture with MongoDB Atlas and AWS. The limitations of traditional maintenance models Vehicle maintenance still follows two basic patterns: reactive and scheduled. In the reactive model, service is initiated only after a problem has already impacted car performance. At that point, it is too late to avoid costly repairs. Scheduled maintenance is more proactive but is often inefficient, leading to unnecessary servicing that proves costly for the driver and does not reflect actual wear and usage. The automotive sector needs to shift toward predictive and personalized care, relying on the connected car data that is being collected in real time by OEMs. But achieving this requires a cloud-native data infrastructure that can support continuous ingestion and real-time processing of this data. From raw sensor data to driving insight The connected vehicle data journey begins at the edge. Vehicle operational data—from engine RPM and temperature, to battery voltage, tire pressure, and onboard diagnostic codes—can be processed locally on the car using AWS IoT Greengrass, a service from AWS that enables local decision-making even without constant cloud connectivity. From there, the data flows into AWS IoT Core and is published to Amazon MSK (Managed Streaming for Apache Kafka). Atlas Stream Processing —which ensures scalable, fault-tolerant stream processing—connects to MSK and ingests this data into MongoDB Atlas, where it is stored using a schema modeled on the Vehicle Signal Specification (VSS) , a standard developed by the COVESA alliance. VSS is a widely adopted open data model that helps normalize vehicle signals and improve interoperability, and it provides a hierarchical, standardized format for structuring vehicle data. It defines a semantic tree of signal, such as Vehicle.Speed , Vehicle.Powertrain.Engine.RPM , or Vehicle.Cabin.Door.FrontLeft.IsOpen to ensure consistency and interoperability across makes, models, and applications. This consistency is critical for large-scale data analysis, cross-platform integration, and AI training. MongoDB, an active member of the COVESA community, is particularly well-suited to implement VSS. Our document-oriented data model allows developers to store deeply nested, flexible JSON structures without enforcing rigid and normalized schemas. This is especially useful when working with evolving vehicle software platforms or optional equipment packages, trim levels, etc., that alter the signal tree. Whether a car has two doors or four, a combustion or an electric drive, MongoDB can seamlessly adapt to its VSS-defined structure without structural rework, saving time and money for the OEMs. Once vehicle data lands in MongoDB Atlas, a series of event-driven triggers enable real-time reactions. Atlas Triggers can detect when an engine temperature exceeds safe thresholds and immediately invoke an AWS Lambda function to log the incident, notify support teams via Amazon EventBridge, or create a maintenance task in a service management system. A strong data tiering strategy is important for connected vehicle use cases. For longer-term trend analysis, vehicle data can be exported to Amazon S3 for model training in Amazon SageMaker. These models can forecast component wear, detect behavioral anomalies, or estimate the Remaining Useful Life (RUL) of key systems. Once the model is trained, it can infer directly on the MongoDB data and feed prediction results back into the database, closing the loop. The alerts and raw telemetry can live inside MongoDB time series collections, which are optimized for high-speed time series data storage and processing. Time series collections also come with Window Functions that enable operations on a specified span of documents or a window in time. Empowering technicians with AI and vector search Once an alert is raised, we can use gen AI to enhance customer and technician experience in dealing with and resolving the identified issue. Traditional diagnostic workflows involve sifting through manuals, logs, and systems of record. Now, with Amazon Bedrock and Atlas Vector Search , technicians can simply ask natural-language questions using a chat assistant embedded in a mobile or web application. Unstructured data such as service manuals, historical record, and technical bulletins are vectorized into arrays of embeddings. These embeddings are indexed and stored in MongoDB Atlas. Once stored and indexed, the technician can query “What is the root cause of the service engine light?” and Atlas Vector Search can search through the vector embeddings and retrieve the most relevant, semantically aligned documents. These results can be fed into large language models exposed by AWS Bedrock to generate the response in a conversational language and tone. MongoDB’s vector search capability integrates seamlessly with traditional metadata search, combining structured queries (e.g., vehicle ID, timestamp) with semantic matching. This unified approach enhances technician productivity and shortens repair cycles—resulting in positive customer engagement. To expose this data (and these insights) to different users, we can leverage AWS AppSync as a managed GraphQL interface. Through AppSync, users can query live telemetry, view predicted maintenance needs, or trigger actions like assigning a technician or updating a vehicle’s diagnostic state—ensuring consistency between backend services and user-facing applications. Business impact across automotive domains The potential applications of this architecture span the entire automotive value chain. For example, fleet operators could benefit from predictive service scheduling, improving uptime while reducing costs. Manufacturers would gain insights into failure patterns, enabling them to make data-driven decisions about component design or supplier quality. Dealerships can improve first-time fix rates with AI-guided diagnostics, while insurance companies could implement usage-based models grounded in real driving behavior. Even suppliers and logistics chains could benefit, using aggregated data to anticipate demand and optimize inventory levels. Smart vehicles, smart connectivity MongoDB’s high-performance, scalable database—paired with the IoT, AI, and machine learning capabilities of AWS—creates a responsive, resilient connected car platform. As vehicles grow smarter, so too must the systems that manage their data. MongoDB’s alignment with the VSS standard ensures that automotive data remains interoperable, searchable, and AI-ready. Atlas Vector Search ensures efficient retrieval of context stored in unstructured data, and when paired with AWS services like IoT Greengrass, SageMaker, Bedrock, and AppSync, this architecture allows enterprises to scale to millions of connected vehicles with confidence. For more information on how to model data in MongoDB using VSS specification, check out our other article . To see these concepts in action, visit our GitHub repository for a hands-on experience and detailed instructions. To learn more about MongoDB’s role in the manufacturing industry, please visit our manufacturing and automotive page .
Scale Performance with View Support for MongoDB Atlas Search and Vector Search
We are thrilled to announce the general availability (GA) of View Support for MongoDB Atlas Search and Atlas Vector Search , available on MongoDB versions 8.0+. This new feature allows you to perform powerful pre-indexing optimizations—including Partial Indexing to filter your collections, and Document Transformation to reshape your data for peak performance. View Support for MongoDB Atlas Search helps you build more efficient, performant, and cost-effective search experiences by giving you precise control over your search strategy. Let's look at how it works. How it works in 3 simple steps At its core, View Support is powered by MongoDB views , queryable objects whose contents are defined by an aggregation pipeline on other collections or views. Getting started is straightforward: Create a view: Define a standard view using an aggregation pipeline to filter or transform your source collection. This feature is designed to support views that contain the stages $match with an $expr operator, $addFields , and $set . Note: Views with multi-collection stages like $lookup are not supported for search indexing at this time. Index the view: Build your MongoDB Atlas Search or Atlas Vector Search index directly on the view you just created. Query the view: This is the best part. You run your $search , $searchMeta , or $vectorSearch queries directly against the view itself to get results from your perfectly curated data. With this simple workflow, you can now fine-tune exactly what and how your data is indexed. The two key capabilities you can use today are Partial Indexing and Document Transformation. Figure 1. High-level architectural diagram of search index replication on a view. Search indexes perform initial sync on the collection and apply the view pipeline before saving the search index to disk storage. Index only what you need with partial indexing Often, only a subset of your data is truly relevant for search. Imagine an e-commerce catalog where only "in-stock" products should be searchable or a RAG system where only documents containing vector embeddings will be retrieved. With Partial Indexing, you can use a $match stage in your view to create a highly-focused index that: Reduces index size: Dramatically shrink the footprint of your search indexes, leading to cost savings and faster operations. Improves performance: Smaller indexes mean faster queries and index build times. Optimize your data model with document transformation Beyond filtering, you can also reshape documents for optimal search performance. Using $addFields or $set in your view, you can create a search-optimized version of your data without altering your original collection. This is perfect for: Pre-computing values: Combine a firstName and lastName into a fullName field for easier searching, or pre-calculate the number of items in an array. Supporting all data types: Convert incompatible data types, like Decimal128, into search-compatible types like Double. You can also convert booleans or ObjectIDs to strings to enable faceting. Flattening your schema: Promote important fields from deeply nested documents to the top level, simplifying queries and improving performance over expensive $elemMatch operations. Testing different vector dimensionalities: Splice large MRL Embeddings from Voyage into smaller ones to evaluate the tradeoffs between accuracy and performance. For example, consider a vacation home rental company with a listings collection storing reviews as an array of objects. To enable end-users to filter for listings with > N reviews, they create a view called listingsSearchView . The view pipeline of listingsSearchView uses an $addFields stage to add the numReviews field, which is computed based on the size of the reviews array. By creating a search index on listingsSearchView , they can run efficient $search queries on numReviews without compromising data integrity in the source collection. Figure 2. High-level architectural diagram of running search queries on a view. After the search index identifies documents to return, mongod applies the view pipeline to return the view documents. Why these optimizations are critical for scaling As your application and data volume grow, search efficiency can become a bottleneck. View Support for MongoDB Atlas Search provides the critical tools you need to maintain blazing-fast performance and control costs at scale by giving you granular control over your indexes. We are incredibly excited to see how you use these new capabilities to build the next generation of powerful search experiences on MongoDB Atlas. Ready to get started? Dive into the documentation to learn more: Atlas Search , Atlas Vector Search . Note: We plan to add compatibility for more types of Views in the future. If there’s a stage that you want to see, please let us know .
How Tavily Uses MongoDB to Enhance Agentic Workflows
As AI agents grow in popularity and are used in increasingly mission-critical ways, preventing hallucinations and giving agents up-to-date context is more important than ever. Context can come from many sources—prompts, documents, proprietary internal databases, and the internet itself. Among these sources, the internet stands out as uniquely valuable, a best-in-class resource for humans and LLMs alike due to its massive scale and constant updates. But how can large language models (LLMs) access the latest and greatest information from the internet? Enter Tavily , one of the companies at the heart of this effort. Tavily provides an easy way to connect the web to LLMs, giving them the answers and context they need to be even more useful. MongoDB had the opportunity to sit down with Rotem Weiss, CEO of Tavily, and Eyal Ben Barouch, Tavily’s Head of Data and AI, to talk about the company’s history, how Tavily uses MongoDB, and the future of agentic workflows. Tavily’s origins Tavily began in 2023 with a simple but powerful idea. "We started with an open source project called GPT Researcher ," Weiss said. "It did something pretty simple—go to the web, do some research, get content, and write a report." That simplicity struck a chord. The project exploded, getting over 20,000 GitHub stars in under two years, signaling to the team that they had tapped into something developers desperately needed. The viral success revealed a fundamental gap in how AI systems access information. "So many use cases today require real-time search, whether it's from the web or from your users," Weiss noted. "And that is basically RAG (retrieval-augmented generation) ." "Developers are slowly realizing not everything is semantic, and that vector search alone cannot be the only solution for RAG," Weiss said. Indeed, for certain use cases, vector stores benefit from further context. This insight, buttressed by breakthrough research around CRAG (Corrective RAG) , pointed toward a future where systems automatically turn to the web to search when they lack sufficient information. Solving the real-time knowledge problem Consider the gap between static training data and our dynamic reality. Questions like "What is the weather today?" or "What was the score of the game last night?" require an injection of real-time information to accurately answer. Tavily's system fills this gap by providing AI agents with fresh, accurate data from the web, exactly when they need it. The challenge Tavily addresses goes beyond information retrieval. “Even if your model ‘knows’ the answer, it still needs to be sent in the right direction with grounded results—using Tavily makes your answers more robust,” Weiss explained. The new internet graph Weiss envisions a fundamental shift in how we think about the architecture of the web. "If you think about the new internet, it’s a fundamentally different thing. The internet used to be between people—you would send emails, you would search websites, etc. Now we have new players, the AI agents, who act as new nodes on the internet graph." These new nodes change everything. As they improve, AI agents can perform many of the same actions as humans, but with different needs and expectations. "Agents want different things than people want," Weiss explained. "They want answers; they don't need fancy UIs and a regular browser experience. They need a quick, scalable system to give them answers in real time. That's what Tavily gives you." The company's focus remains deliberately narrow and deep. "We always want to stick to the infrastructure layer compared to our competitors, since you don't know where the industry is going," Weiss said. "If we focus on optimizing the latency, the accuracy, the scalability, that's what is going to win, and that's what we're focused on." Figure 1. The road to insightful responses for users with TavilyHybridClient. MongoDB: The foundation for speed and scale To build their infrastructure, Tavily needed a database that could meet their ambitious performance requirements. For Weiss, the choice was both practical and personal. "MongoDB is the first database I ever used as a professional in my previous company," he said. "That's how I started, and I fell in love with MongoDB. It's amazing how flexible it is–it's so easy to implement everything." The document model, the foundation upon which MongoDB is built, allowed Tavily to build and scale an enterprise-grade solution quickly. But familiarity alone didn't drive the decision. MongoDB Atlas had the performance characteristics Tavily required. "Latency is one of the things that we always optimize for, and MongoDB delivers excellent price performance," Tavily’s Ben Barouch explained. "The performance is much more similar to a hot cache than a cold cache. It's almost like it's in memory!" The managed service aspect proved equally crucial. "MongoDB Atlas also saves a lot of engineering time," Weiss noted. In a fast-moving startup environment, MongoDB Atlas enabled Weiss to focus on building Tavily and not worry about the underlying data infrastructure. "Today, companies need to move extremely fast, and at very lean startups, you need to only focus on what you are building. MongoDB allows Tavily to focus on what matters most, our customers and our business." Three pillars of success The Tavily team highlighted three specific MongoDB Atlas characteristics that have become essential to their operations: Vector search : Perhaps most importantly for the AI era, MongoDB's vector search capabilities allow it to be "the memory for agents." As Weiss put it, "The only place where a company can have an edge is their proprietary data. Every company can access the best models, every company can search the web, every company can have good agent orchestration. The only differentiation is utilizing your internal, proprietary data and injecting it in the fastest and most efficient way to the prompt." MongoDB, first with Atlas Vector Search and now with Hybrid Search , has effective ways of giving agents performant context, setting them apart from those built with other technologies. Autoscaling : "Our system is built for a very fast-moving company, and we need to scale in a second," Weiss continued. "We don't need to waste time each week making changes that are done automatically by MongoDB Atlas." Monitoring : "We have other systems where we need to do our own monitoring with other cloud providers, and it's a lot of work that MongoDB Atlas takes care of for us," Weiss explained. "MongoDB has great visibility." Betting on proven innovation Tavily has been impressed with the way MongoDB has kept a finger on the pulse of the evolving AI landscape and added features accordingly. “I believed that MongoDB would be up to date quickly, and I was right," Weiss said. "MongoDB quickly thought about vector search, about other features that I needed, and got them in the product. Not having to bolt-on a separate vector database and having those capabilities natively in Atlas is a game changer for us." Ben Barouch emphasized the strategic value of MongoDB’s entire ecosystem, including the community built around the database: "When everyone's offering the same solutions, they become the baseline, and then the things that MongoDB excels at, things like reliability and scalability, are really amplified. The community, especially, is great; MongoDB has excellent developer relations, so learning and using MongoDB is very easy." The partnership between MongoDB and Tavily extends beyond technology to trust. "In this crazy market, where you have new tools every two hours and things are constantly changing, you want to make sure that you're choosing companies you trust to handle things correctly and fast," Weiss said. "I want a vendor where if I have feedback, I'm not afraid to say it, and they will listen." Looking ahead: The multi-agent future As Tavily continues building the infrastructure for AI agents to search the web, Weiss sees the next evolution already taking shape. "The future is going to be thinking about combining these one, two, three, four agents into a workflow that makes sense for specific use cases and specific companies. That will be the new developer experience." This vision of orchestrated AI workflows represents just the beginning. With MongoDB Atlas providing the scalable, reliable foundation they need, Tavily is positioning itself at the center of a fundamental shift in how information flows through our digital world. The internet welcomed people first, then connected them in revolutionary ways. Now, as AI agents join the network, companies like Tavily are building the infrastructure to ensure this next chapter of digital evolution is both powerful and accessible. With MongoDB as their foundation, they're not just adapting to the future—they're building it. Interested in building with MongoDB Atlas yourself? Try it today ! Use Tavily for working memory in this MongoDB tutorial . Explore Tavily’s Crawl to RAG example.
Fine-tune MongoDB Deployments with AppMap’s AI Tools and Diagrams
In a rapidly changing landscape, organizations that adapt for growth, efficiency, and competitiveness will be best positioned to succeed. Central to this effort is the continuous fine-tuning and troubleshooting of existing deployments, enabling companies to deliver high-performance applications that meet their business requirements. Yet, navigating application components often leads to long development cycles and high costs. Developers spend valuable time deciphering various programming languages, frameworks, and infrastructures to optimize their systems. They may have to work with complicated, intertwined code, which makes updates difficult. Moreover, older architectures increase information overload with no institutional memory to understand current workloads. To help organizations overcome these challenges, AppMap partnered with MongoDB Atlas to fine-tune MongoDB deployments and achieve optimal performance, enabling developers to build more modern and efficient applications. The AppMap solution empowers developers with AI-driven insights and interactive diagrams that clarify application behavior, decode complex application architectures, and streamline troubleshooting. This integration delivers personalized recommendations for query optimization, proper indexing, and better database interactions. Complementing these capabilities, MongoDB Atlas offers the flexibility, performance, and security essential for building resilient applications and advancing AI-powered experiences. AppMap’s technology stack Founded in 2020 by CEO Elizabeth Lawler, AppMap empowers developers to visualize, understand, and optimize application behavior. By analyzing applications in action, AppMap delivers precise insights into interactions and performance dynamics, recording APIs, functions, and service behaviors. This information is then presented as interactive diagrams, as shown in Figure 1, which can be easily searched and navigated to streamline the development process. Figure 1. Interactive diagram for a MongoDB query. As shown below, AppMap also features Navie, an AI assistant. Navie offers customers advanced code architecture analysis and customized recommendations, derived from capturing application behavior at runtime. This rich data empowers Navie to deliver smarter suggestions, assisting teams in debugging complex issues, asking contextual questions about unfamiliar code, and making more informed code changes. Figure 2. The AppMap Navie AI assistant. With these tools, AppMap improves the quality of the code running with MongoDB, helping developers better understand the flow of their apps. Using AppMap in a MongoDB application Imagine that your team has developed a new e-commerce application running on MongoDB. But you're unfamiliar with how this application operates, so you'd like to gain insights into its behavior. In this scenario, you decide to analyze your application using AppMap by executing the node package with your standard run command. npx appmap-node npm run dev With this command, you use your application just like you normally would. But now every time your app communicates through an API, it will create records. These records are used to create diagrams that help you see and understand how your application works. You can look at these diagrams to get more insights into your app's behavior and how it interacts with the MongoDB database. Figure 3. Interaction diagram for an e-commerce application. Next, you can use the Navie AI assistant to receive tailored insights and suggestions for your application. For instance, you can ask Navie to identify the MongoDB commands your application uses and to provide advice on optimizing query performance. Navie will identify the workflow of your application and may propose strategies to refine database queries, such as reindexing for improved efficiency or adjusting aggregation framework parameters. Figure 4. Insights provided by the Navie AI assistant. With this framework established, you can seamlessly interact with your MongoDB application, gain insights into its usage, enhance its performance, and achieve quicker time to market. Enhancing MongoDB apps with AppMap Troubleshooting and optimizing your MongoDB applications can be challenging, due to the complexity of related microservices that run your services. AppMap facilitates this process by providing in-depth insights into your application behavior with an AI-powered assistant, helping developers better understand your code. With faster root cause analysis and deeper code understanding, businesses can boost developer productivity, improve application performance, and enhance customer satisfaction. These benefits ultimately lead to greater agility and a stronger competitive position in the market. Enhance your development experience with MongoDB Atlas and AppMap . To learn more about how to fine-tune apps with MongoDB, check out the best practices guide for MongoDB performance and stop by our Partner Ecosystem Catalog to read about our integrations with MongoDB’s ever-evolving partner ecosystem.
MongoDB and Delbridge: Unlocking Flexible and Custom Data Integration
Modern applications are growing in complexity and scale, and seamless data integration is becoming a vital business priority. It’s critical to provide developers with tools that enable efficient access to data, allowing them to build powerful applications that deliver exceptional user experiences. MongoDB Atlas’s unified database platform empowers teams to build the next generation of applications with the flexibility and performance required for today’s fast-moving, data-driven world. As technology continues to evolve, businesses now have an exciting opportunity to embrace the next level of integration solutions. Now generally available, the Delbridge Data API is a modern solution designed to help organizations unlock even greater value from their MongoDB systems—with features built for scalability, security, and customization. Navigating the future of data integration Delbridge simplifies development workflows by enabling frontend applications to access data directly, often removing the need for custom backend infrastructure. This approach is especially effective for initial projects and prototypes, where speed and simplicity are key. However, as applications grow and scale, organizations increasingly need solutions that can handle evolving complexity, such as integrating custom business logic, ensuring compliance with diverse regulatory standards, or adapting workflows to hybrid or multi-cloud environments. Businesses now seek integration platforms that offer a greater level of control, flexibility, and security. The Delbridge Data API: Built for business growth The Delbridge Data API was built as a lightweight, developer-friendly alternative to the now-deprecated MongoDB Data API. It preserves the convenience of the original MongoDB API while offering a streamlined experience tailored to modern developer workflows, enhancing functionality to keep pace with the demands of modern applications. It provides all the essential operations, such as reading, writing, deleting, and updating data, while giving teams far greater control over how requests are processed and secured. Whether businesses need custom validations, tailored access rules, or advanced observability, Delbridge enables them to design solutions to meet their specific needs. One of the biggest advantages of Delbridge is how it aligns with the ways businesses are evolving. As organizations adopt microservices architecture, hybrid cloud strategies, and event-driven data flows, they need an integration tool that can adapt seamlessly to their infrastructure. Delbridge acts as a customizable gateway between MongoDB and your applications, giving you the flexibility to tailor your data access layer while ensuring optimal performance. Real-world example: Optimizing ride-sharing platforms with microservices Imagine a ride-sharing platform that operates across multiple cities, managing millions of drivers, riders, and trips daily. The system relies on microservices to handle critical tasks such as driver routing, fare calculation, real-time location tracking, and customer communications. To ensure efficiency at scale, the platform needs to validate ride requests, optimize driver assignments, and handle dynamic pricing based on demand—all while maintaining low latency and high reliability. By adopting the Delbridge Data API, the platform achieved significant enhancements: Applied custom business logic to dynamically match riders with nearby drivers based on estimated arrival times and trip preferences. Integrated real-time pricing adjustments tied to demand surges, geographic zones, and rider behavior. Optimized event-driven workflows for ride updates and notifications (e.g., alerts for driver arrivals or delays). Improved observability with custom dashboards, logs, and metrics to monitor system performance and identify bottlenecks instantly. Figure 1. Delbridge Data API integrated with MongoDB Atlas. With these upgrades, the platform scaled seamlessly while delivering a faster and smoother experience to both riders and drivers. The flexibility of Delbridge enabled the platform to tailor its operations, meet regional demands, and support its growing microservices-based architecture. Unlocking flexible data integration with the Delbridge Data API Adopting the Delbridge Data API offers significant benefits for businesses looking to grow strategically. Its customization features allow organizations to tailor their integrations to meet unique requirements, whether it’s by adding middleware, enforcing specific business rules, or creating tenant-level controls. The API’s scalability empowers teams to efficiently handle increasing volumes of data and users with advanced capabilities such as caching, rate limiting, and distributed deployments. It also enhances observability by providing detailed logging, tracing, and error management hooks, enabling faster troubleshooting and optimized performance. Furthermore, the Delbridge Data API helps organizations meet internal and external compliance needs with features like IP whitelisting, role-based access control (RBAC), and fine-grained permissions. By leveraging these capabilities, businesses gain full ownership of their data layer, ensuring it adapts to today’s objectives while remaining flexible enough to anticipate future challenges. Begin your journey Migrating to the Delbridge Data API is a straightforward process, designed to minimize disruption while delivering quick results. Businesses can start by mirroring traffic to both APIs to test performance, gradually migrate critical endpoints, and monitor progress to ensure a smooth transition. Once operations are fully aligned, the legacy API usage can be retired seamlessly. Explore how Delbridge and MongoDB enable flexible, scalable, and secure integration through the Delbridge Data API— check out the Delbridge API page to learn more and to request access! Read more about Delbridge Solutions on its MongoDB partner ecosystem page .
Introducing voyage-context-3: Focused Chunk-Level Details with Global Document Context
Note to readers: voyage-context-3 is currently available through the Voyage AI API directly. For access, sign up for Voyage AI . TL;DR : We’re excited to introduce voyage-context-3, a contextualized chunk embedding model that produces vectors for chunks that capture the full document context without any manual metadata and context augmentation, leading to higher retrieval accuracies than with or without augmentation. It’s also simpler, faster, and cheaper, and is a drop-in replacement for standard embeddings without downstream workflow changes, also reducing chunking strategy sensitivity. On chunk-level and document-level retrieval tasks, voyage-context-3 outperforms OpenAI-v3-large by 14.24% and 12.56%, Cohere-v4 by 7.89% and 5.64%, Jina-v3 late chunking by 23.66% and 6.76%, and contextual retrieval by 20.54% and 2.40%, respectively. It also supports multiple dimensions and multiple quantization options enabled by Matryoshka learning and quantization-aware training, saving vectorDB costs while maintaining retrieval accuracy. For example, voyage-context-3 (binary, 512) outperforms OpenAI-v3-large (float, 3072) by 0.73% while reducing vector database storage costs by 99.48%—virtually the same performance at 0.5% of the cost. We’re excited to introduce voyage-context-3, a novel contextualized chunk embedding model, where chunk embedding encodes not only the chunk's own content, but also captures the contextual information from the full document. voyage-context-3 provides a seamless drop-in replacement for standard, context-agnostic embedding models used in existing retrieval-augmented generation (RAG) pipelines, while offering improved retrieval quality through its ability to capture relevant contextual information. Compared to both context-agnostic models with isolated chunking (e.g., OpenAI-v3-large, Cohere-v4) as well as existing methods that add context and metadata to chunks, including overlapping chunks and attaching metadata, voyage-context-3 delivers significant gains in retrieval performance while simplifying the tech stack. On chunk-level (retrieving the most relevant chunk) and document-level retrieval (retrieving the document containing the most relevant chunk), voyage-context-3 outperforms on average: OpenAI-v3-large and Cohere-v4 by 14.24% and 12.56%, and 7.89% and 5.64%, respectively. Context augmentation methods Jina-v3 late 1 chunking and contextual retrieval 2 by 23.66% and 6.76%, and 20.54% and 2.40%, respectively. voyage-3-large by 7.96% and 2.70%, respectively. Chunking challenges in RAG Focused detail vs global context. Chunking—breaking large documents into smaller segments, or chunks—is a common and often necessary step in RAG systems. Originally, chunking was primarily driven by the models’ limited context window (which is significantly extended by, e.g., Voyage’s models lately). More importantly, it allows the embeddings to contain precise fine-grained information about the corresponding passages, and as a result, allows the search system to pinpoint precisely relevant passages. However, this focus can come at the expense of a broader context. Finally, without chunking, users must pass complete documents to downstream large language models (LLMs), driving up costs as many tokens may be irrelevant to the query. For instance, if a 50-page legal document is vectorized into a single embedding, detailed information—such as the sentence “All data transmissions between the Client and the Service Provider’s infrastructure shall utilize AES-256 encryption in GCM mode”—is likely to be buried or lost in the aggregate. By chunking the document into paragraphs and vectorizing each one separately, the resulting embeddings can better capture localized details like “AES-256 encryption.” However, such a paragraph may not contain global context—such as the Client’s name—which is necessary to answer queries like “What encryption methods does Client VoyageAI want to use?” Ideally, we want both focused detail and global context—without tradeoffs . Common workarounds—such as chunk overlaps, context summaries using LLMs (e.g., Anthropic’s contextual retrieval), or metadata augmentation—can introduce extra steps into an already complex AI application pipeline. These steps often require further experimentation to tune, resulting in increased development time and serving cost overhead. Introducing contextualized chunk embeddings We’re excited to introduce contextualized chunk embeddings that capture both focused detail and global context. Our model processes the entire document in a single pass and generates a distinct embedding for each chunk. Each vector encodes not only the specific information within its chunk but also coarse-grained, document-level context, enabling richer and more semantically aware retrieval. The key is that the neural network sees all the chunks at the same time and decides intelligently what global information from other chunks should be injected into the individual chunk embeddings. Full document automatic context aware: Contextualized chunk embeddings capture the full context of the document without requiring the user to manually or explicitly provide contextual information. This leads to improved retrieval performance compared to isolated chunk embeddings, while remaining simpler, faster, and cheaper than other context-augmentation methods. Seamless drop-in replacement and storage cost parity: voyage-context-3 is a seamless drop-in replacement for standard, context-agnostic embedding models used in existing search systems, RAG pipelines, and agentic systems. It accepts the same input chunks and produces vectors with identical output dimensions and quantization—now enriched with document-level context for better retrieval performance. In contrast to ColBERT , which introduces an extensive amount of vectors and storage costs, voyage-context-3 generates the same number of vectors and is fully compatible with any existing vector database. Less sensitive to chunking strategy: While chunking strategy still influences RAG system behavior—and the optimal approach depends on data and downstream tasks—our contextualized chunk embeddings are empirically shown to reduce the system's sensitivity to these strategies, because the model intelligently supplement overly short chunks with global contexts. Contextualized chunk embeddings outperform manual or LLM-based contextualization because neural networks are trained to capture context intelligently from large datasets, surpassing the limitations of ad hoc efforts. voyage-context-3 was trained using both document-level and chunk-level relevance labels, along with a dual objective that teaches the model to preserve chunk-level granularity while incorporating global context. table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Context Preservation Engineering Complexity Retrieval Accuracy Standard Embeddings (e.g., OpenAI-v3-large) None Low Moderate Metadata Augmentation & Contextual Retrieval (e.g., Jina-v3 late chunking) Partial High Moderate-High Contextualized Chunk Embeddings (e.g., voyage-context-3) Full, Principled Low Highest Evaluation details Chunk-level and document-level retrieval For a given query, chunk-level retrieval returns the most relevant chunks, while document-level retrieval returns the documents containing those chunks. The figure below illustrates both retrieval levels across chunks from n documents. The most relevant chunk, often referred to as the “golden chunk,” is bolded and shown in green. Its corresponding parent document is shown in blue. Datasets We evaluate on 93 domain-specific retrieval datasets, spanning nine domains: web reviews, law, medical, long documents, technical documentation, code, finance, conversations, and multilingual, which are listed in this spreadsheet . Every dataset contains a set of queries and a set of documents. Each document consists of an ordered sequence of chunks, which are created by us via a reasonable chunking strategy. As usual, every query has a number of relevant documents with a potential score indicating the degree of relevance, which we call document-level relevance labels and can be used for the evaluation of document-level retrieval. Moreover, each query also has a list of most relevant chunks with relevance scores, which are curated through various ways, including labeling by LLMs. These are referred to as chunk-level relevance labels and are used for chunk-level retrieval evaluation. We also include proprietary real-world datasets, such as technical documentation and documents containing header metadata. Finally, we assess voyage-context-3 across different embedding dimensions and various quantization options, on standard single-embedding retrieval evaluation, using the same datasets as in our previous retrieval-quality-versus-storage-cost analysis . Models We evaluate voyage-context-3 alongside several alternatives, including: OpenAI-v3-large (text-embedding-3-large), Cohere-v4 (embed-v4.0), Jina-v3 late chunking (jina-embeddings-v3), contextual retrieval, voyage-3.5, and voyage-3-large. Metrics Given a query, we retrieve the top 10 documents based on cosine similarities and report the normalized discounted cumulative gain (NDCG@10), a standard metric for retrieval quality and a variant of the recall. Results All the evaluation results are available in this spreadsheet , and we analyze the data below. Domain-specific quality. The bar charts below show the average retrieval quality of voyage-context-3 with full-precision 2048 embeddings for each domain. In the following chunk-level retrieval chart, we can see that voyage-context-3 outperforms all other models across all domains. As noted earlier, for chunk-level retrieval, voyage-context-3 outperforms on average OpenAI-v3-large, Cohere-v4, Jina-v3 late chunking, and contextual retrieval by 14.24%, 7.89%, 23.66%, and 20.54%, respectively. voyage-context-3 also outperforms all other models across all domains in document-level retrieval, as shown in the corresponding chart below. On average, voyage-context-3 outperforms OpenAI-v3-large, Cohere-v4, Jina-v3 late chunking, and contextual retrieval by 12.56%, 5.64%, 6.76%, and 2.40%, respectively. Real-world datasets. voyage-context-3 performs strongly on our proprietary real-world technical documentation and in-house datasets, outperforming all other models. The bar chart below shows chunk-level retrieval results. Document-level retrieval results are provided in the evaluation spreadsheet . Chunking sensitivity . Compared to standard, context-agnostic embeddings, voyage-context-3 is less sensitive to variations in chunk size and delivers stronger performance with smaller chunks. For example, on document-level retrieval, voyage-context-3 shows only a 2.06% variance, compared to 4.34% for voyage-3-large, and outperforms voyage-3-large by 6.63% when using 64-token chunks. Context metadata . We also evaluate performance when context metadata is prepended to chunks. Even with metadata prepended to chunks embedded by voyage-3-large, voyage-context-3 outperforms it by up to 5.53%, demonstrating better retrieval performance without the extra work and resources required to prepend metadata. Matryoshka embeddings and quantization . voyage-context-3 supports 2048, 1024, 512, and 256- dimensional embeddings enabled by Matryoshka learning and multiple embedding quantization options—including 32-bit floating point, signed and unsigned 8-bit integer, and binary precision—while minimizing quality loss. To clarify in relation to the previous figures, the chart below illustrates single-embedding retrieval on documents. Compared with OpenAI-v3-large (float, 3072), voyage-context-3 (int8, 2048) reduces vector database costs by 83% with 8.60% better retrieval quality. Further, comparing OpenAI-v3-large (float, 3072) with voyage-context-3 (binary, 512), vector database costs are reduced by 99.48% with 0.73% better retrieval quality; that’s virtually the same retrieval performance at 0.5% of the cost. Try voyage-context-3 voyage-context-3 is available today! The first 200 million tokens are free. Get started with this quickstart tutorial . You can swap in voyage-context-3 into any existing RAG pipeline you have without requiring any downstream changes. Contextualized chunk embeddings are especially effective for: Long, unstructured documents such as white papers, legal contracts, and research reports. Cross-chunk reasoning , where queries require information that spans multiple sections. High-sensitivity retrieval tasks —such as in finance, medical, or legal domains—where missing context can lead to costly errors. To learn more about building AI applications with MongoDB, visit the MongoDB AI Learning Hub . 1 Jina. “ Late Chunking in Long-Context Embedding Models .” August 22, 2024 2 Anthropic. “ Introducing Contextual Retrieval .” September 19, 2024.