13 Feb 2025

How AI helped us build a smart search in .NET application. Kernel Memory library, vector databases, and RAG

Search is a key part of most applications. Full-text engines like Elasticsearch are great at matching words, but they don't always consider the context of the query.

Imagine you're searching for a “laptop bag” in an online store where the search is based on a full-text engine. The system might show “travel bag” because both products contain the word “bag” even though your query implied otherwise.

Semantic search takes into account not only words but also their meaning. In this article, we'll see how to implement it in .NET using Microsoft's Kernel Memory library. We'll plug in the Qdrant vector database, embedding model text-embedding-ada-002 from OpenAI, and finally show how Retrieval-Augmented Generation (RAG), an approach that combines search and generative models for more accurate answers, works.

All source code is available in our GitHub repository. The application is quite elaborate as if it were ready for production, although some aspects are simplified. In this article we will only touch on the key aspects, so we strongly recommend reading the article while familiarizing yourself with the code. Be sure to read the README, it describes the startup instructions.

This article is produced on a non-commercial basis to foster the open-source AI community in the .NET ecosystem.
If you found our solution useful, support the repository with a star ⭐ on GitHub - it will help the development of the project and attract more developers to discuss and improve semantic search approaches! 🚀

Part 1. Starting a project using Docker-Compose

To deploy the project locally, we use Docker-Compose. This allows us to run all the necessary application components, including the API and the Qdrant vector database. In the docker-compose.yml file, we configure the services for the API and database, and specify the ports and dependencies between them.

services:
  kernelmemory.ecommerce.sample.api:
    image: ${DOCKER_REGISTRY-}kernelmemoryecommercesampleapi
    build:
      context: .
      dockerfile: src/KernelMemory.Ecommerce.Sample.Api/Dockerfile
    ports:
      - 9000:8080

  kernelmemory.ecommerce.sample.qdrant:
    image: qdrant/qdrant
    container_name: KernelMemory.Ecommerce.Sample.Qdrant
    ports:
      - 6333:6333
      - 6334:6334
    expose:
      - 6333
      - 6334
      - 6335
    volumes:
      - qdrant-data:/qdrant/storage

volumes:
  qdrant-data:

docker-compose.yml

In case Qdrant is not suitable for some reason, the repository has a PostgreSQL integration example with the pgvector extension that can also be used to store vector data.

Part 2. Application Configuration

Once we have configured Docker-Compose to deploy the project, the next step is to configure the application and set up all the necessary services. Specifically, we create and configure an application instance using the Kernel Memory library.

In the following code snippet, we configure the application by passing configuration parameters for integration with OpenAI, Qdrant (or Postgres alternatively):

public sealed partial class Program
{
    private static async Task Main(string[] args)
    {
        var builder = WebApplication.CreateBuilder(args);
        ..
        var app = BuildAsynchronousKernelMemoryApp(builder);
        ..
    }

    private static WebApplication BuildAsynchronousKernelMemoryApp(WebApplicationBuilder appBuilder)
    {
        var openAiConfig = new OpenAIConfig();
        appBuilder.Configuration.BindSection("KernelMemory:Services:OpenAI", openAiConfig);

        var qdrantConfig = new QdrantConfig();
        appBuilder.Configuration.BindSection("KernelMemory:Services:Qdrant", qdrantConfig);

        // Uncomment and configure this section if you want to use Postgres as the memory database
        //var postgresConfig = new PostgresConfig();
        //appBuilder.Configuration.BindSection("KernelMemory:Services:Postgres", postgresConfig);

        var searchClientConfig = new SearchClientConfig();
        appBuilder.Configuration.BindSection("KernelMemory:Retrieval:SearchClient", searchClientConfig);
        ..
        appBuilder.AddKernelMemory(kmb =>
        {
            kmb.WithOpenAI(openAiConfig);
            kmb.WithQdrantMemoryDb(qdrantConfig);
            // Uncomment the following line to enable Postgres as a memory database.
            //kmb.WithPostgresMemoryDb(postgresConfig);
            kmb.WithSearchClientConfig(searchClientConfig);
            ..
        });

        return appBuilder.Build();
    }
}

Program.cs

The appsettings.json configuration can be viewed in the project source code.

Part 3. Preparing data for searching

For our example, we use a CSV file that contains a database of products for an online store. This file includes product information such as product name, description, category, and other characteristics.

CSV file with products

This is the product entity itself:

public sealed record Product(
    Guid Id,
    string Name,
    string Description,
    decimal Price,
    string PriceCurrency,
    int SupplyAbility,
    int MinimumOrder);

We need to implement vector search, so that we can search products not only by words, but also by their meaning.

To implement such a search, we need to convert textual information about products into a vector representation that can be used to compare semantic proximity. We will use a vector database (Qdrant, in this case) to store and efficiently search such vectors.

Vector search is a method in which objects (for example, products) are represented as vectors - numerical arrays that reflect their meaning.

Instead of searching for exact word matches, the system compares the search query and data vectors to find the most similar objects in terms of meaning. This allows taking into account not only word matches, but also the context and relevance of the information.

Products are saved in the database as vector points

The closer two objects are in meaning, the closer their vectors are in this space.

For example, the Hierarchical Navigable Small World (HNSW) algorithm can be used to find similar objects by identifying the vectors closest to the query vector. It works by building a multi-layered graph structure, where each node represents a vector, and edges connect nodes that are close to each other in the vector space. The algorithm navigates through this graph to efficiently compare distances and retrieve the most similar vectors to the query.

To convert items into vectors, we will use the embedding model. This model takes a text description of a product and converts it into a vector - a numeric representation that takes into account the semantics of the product.

In our case, we will be using the text-embedding-ada-002 model from OpenAI. However, it is worth noting that the model selection is configurable via the Kernel Memory library, so you can choose other models depending on your preferences and requirements.

Embedding model demonstration

To load products into the system, we will create a ProductIngestion endpoint. This endpoint will accept a CSV file with product data. Then, using the MediatR library, we will pass the data to the ProductIngestionCommandHandler. Endpoint will return the IDs of the added products.

public sealed class ProductIngestion : IEndpoint
{
    public void MapEndpoint(IEndpointRouteBuilder app)
    {
        app.MapPost("api/products/ingestion", async (IFormFile file, ISender sender) =>
        {
            if (file == null || file.Length == 0)
            {
                return ApiResults.Problem(
                    "Endpoints.ProductsIngestion.Failed",
                    "File is missing or empty");
            }

            using var stream = file.OpenReadStream();

            Result<IReadOnlyCollection<string>> result = await sender.Send(new ProductIngestionCommand(stream));

            if (!result.IsSuccess)
            {
                return ApiResults.Problem(result.Error.Code, result.Error.Description);
            }

            return Results.Ok(result.Value);
        })
        .DisableAntiforgery();
    }
}

Below is the ProductIngestionCommandHandler code that processes the loaded CSV file and imports the products as text using the ImportTextAsync method from the KernelMemory library. Each item in the process, after being serialized into JSON, is transformed into a separate vector.

Behind the scenes, the ImportTextAsync method triggers a series of calls within the KernelMemory library, which interacts with the OpenAI API in real time (with the API key we configured earlier). It uses the text-embedding-ada-002 model to convert the serialized JSON of each product into a vector, which is then returned. The resulting vector is stored in Qdrant by KernelMemory.

Product ingestion sequence diagram

public sealed record ProductIngestionCommand(Stream ProductsFileStream) : ICommand<IReadOnlyCollection<string>>;

public sealed class ProductIngestionCommandHandler(
    ICsvReader<Product> csvReader, IKernelMemory memory) : ICommandHandler<ProductIngestionCommand, IReadOnlyCollection<string>>
{
    public async Task<Result<IReadOnlyCollection<string>>> Handle(ProductIngestionCommand request, CancellationToken cancellationToken)
    {
        var readingResult = await csvReader.ReadRecordsAsync(request.ProductsFileStream, cancellationToken);
        if (!readingResult.IsSuccess)
        {
            return Result.Failure<IReadOnlyCollection<string>>(readingResult.Error);
        }

        var importTasks = readingResult.Value.Select(async product =>
        {
            return await memory.ImportTextAsync(
                JsonSerializer.Serialize(product),
                documentId: product.Id.ToString(),
                cancellationToken: cancellationToken);
        });

        var documentIds = await Task.WhenAll(importTasks);

        return documentIds;
    }
}

Now we can test ProductIngestion functionality on the UI of the application:

Testing product ingestion

Part 4. Vector search endpoint

Now that the data has been loaded into Qdrant, let's take a look at how the vector search endpoint is implemented. A GET request is used to receive the searchQuery — the query input by the application user. This query is then passed through MediatR to the ProductVectorSearchQueryHandler.

public class ProductVectorSearch : IEndpoint
{
    public void MapEndpoint(IEndpointRouteBuilder app)
    {
        app.MapGet("api/products/search/vector", async (string searchQuery, ISender sender) =>
        {
            var result = await sender.Send(new ProductVectorSearchQuery(searchQuery));

            return Results.Ok(result.Value);
        });
    }
}

In the code of the ProductVectorSearchQueryHandler, we pass the search query to the SearchAsync method of the KernelMemory library. The logic within SearchAsync interacts with the OpenAI API to convert the query text into a vector. Once the vector is generated, the library queries the Qdrant vector database, which searches for the most similar item vectors, formatted the same way as we structured them in the previous step. This results in JSON data, which needs to be deserialized into Product objects before being returned to the higher level.

Note the parameters minRelevance and limit in the SearchAsync method. These parameters allow us to control how many results are returned from the database and set the minimum similarity threshold (with possible values ranging from 0 to 1):

public sealed record ProductVectorSearchQuery(string SearchQuery) : IQuery<ProductSearchResponse>;

public sealed class ProductVectorSearchQueryHandler(
    IKernelMemory memory,
    IOptions<ProductSearchOptions> options,
    ILogger<ProductVectorSearchQueryHandler> logger) : IQueryHandler<ProductVectorSearchQuery, ProductSearchResponse>
{
    public async Task<Result<ProductSearchResponse>> Handle(
        ProductVectorSearchQuery request,
        CancellationToken cancellationToken)
    {
        var searchResult = await memory.SearchAsync(
            request.SearchQuery,
            minRelevance: options.Value.MinSearchResultsRelevance,
            limit: options.Value.SearchResultsLimit,
            cancellationToken: cancellationToken);

        if (searchResult.NoResult == true)
        {
            return ProductSearchResponse.NoProducts(options.Value.MinSearchResultsRelevance);
        }

        List<Product> foundProducts;
        try
        {
            foundProducts = searchResult.Results
                .SelectMany(res => res.Partitions)
                .Select(part => JsonSerializer.Deserialize<Product>(part.Text)!)
                .ToList();
        }
        catch (JsonException ex)
        {
            logger.LogError(ex, "Failed to deserialize search result partition text for query '{Query}'", request.SearchQuery);

            return ProductSearchResponse.NoProducts(options.Value.MinSearchResultsRelevance);
        }

        return new ProductSearchResponse(
            searchResult.NoResult,
            options.Value.MinSearchResultsRelevance,
            searchResult.Results.Count,
            foundProducts);
    }
}

We can test the functionality of ProductVectorSearch directly through the application's user interface:

Testing vector search

Part 5. Retrieval-Augmented Generation (RAG). How to combine vector search capabilities with large language models?

Vector search is generally effective at retrieving relevant data, but it doesn’t always provide a direct answer. At times, users need more than just a list of products—they require detailed information or personalized recommendations.

This is where Retrieval-Augmented Generation (RAG) comes into play. RAG combines:

Information Retrieval: Pulling relevant data (from Qdrant).
Response Generation: Creating a refined response using a large language model like GPT.

RAG is a powerful technique used in more than just e-commerce. It's employed in analytics platforms, personalized assistants, and other complex systems where it's crucial to merge search with insightful response generation.

In our project, we’ll demonstrate a simple example of RAG. We'll leverage a large language model to intelligently filter the products returned by the vector search.

For instance, in the vector search section, let’s input the query “Gaming laptop with RTX 3060”.

Vector search results for the “Gaming laptop with RTX 3060” query

As we can see, the search returns 5 laptops, but based on the descriptions, not all of them seem to have an RTX 3060 graphics card. This is not a contradiction in the context of semantic search principles. Vector databases rely on algorithms to find similar vectors, and it doesn't matter if the description mentions an RTX 3060 or 3080. The vectors for these products are similar, which is why we got them in the results.

In a real e-commerce application, custom user filters would typically handle this situation, but for this example, we’ll filter the results using a large language model. While using a large language model for such filtering might seem like overkill—like using a microscope to hammer nails—it serves as a simple demonstration. For this, we'll use OpenAI’s gpt-4o-mini as our language model.

Let’s return to the UI of the application and send the same query to the search section, this time utilizing RAG for enhanced filtering.

RAG search results for the “Gaming laptop with RTX 3060” query

As we can see, this time, only laptops with the RTX 3060 graphics card are returned. In the next section, we will explore how search using RAG is implemented in the application.

Part 6. Retrieval-Augmented Generation (RAG). How does it work?

Before we get into the code, let's look at the diagram.

Retrieval-Augmented Generation (RAG) diagram

Let's start with the upper left corner. We have products that have already been processed through the embedding model, their vectors obtained, and stored in a vector database. This is the process we discussed earlier, specifically when we covered how we prepared the data for search.

Now, the vector database holds products stored as vectors, which can be searched. When a user submits a search query, the query is converted into a vector using the embedding model, and we retrieve the most similar vectors along with their payload (referred to as "Context" in the diagram).

With the Context, the Query (i.e., the search query), and the Prompt — an instruction for the model (more on that below) — we send the request to the large language model and receive the result.

In the case of the example from the previous section, this results in filtered laptops.

Let’s move on to the code, starting with how the Prompt — the instruction for the model — is implemented. In the Program class where we configure the application, we need to add the following line:

kmb.WithCustomPromptProvider(new ProductSearchPromptProvider(productSearchOptions.SearchResultsLimit));

This registers a custom prompt provider, ProductSearchPromptProvider, within the KernelMemoryApp. The provider is responsible for configuring the instructions sent to the model, based on the search options.

public sealed partial class Program
{
    private static async Task Main(string[] args)
    {
        var builder = WebApplication.CreateBuilder(args);
        ...
        var app = BuildAsynchronousKernelMemoryApp(builder);
        ...
    }

    private static WebApplication BuildAsynchronousKernelMemoryApp(WebApplicationBuilder appBuilder)
    {
		    ...
		    var productSearchOptions = appBuilder.Configuration
            .GetRequiredSection(ProductSearchOptions.ConfigurationKey)
            .Get<ProductSearchOptions>()
            ?? throw new InvalidProgramException(ProductSearchOptions.ConfigurationKey);
            
        appBuilder.AddKernelMemory(kmb =>
        {
            ...
            kmb.WithCustomPromptProvider(new ProductSearchPromptProvider(productSearchOptions.SearchResultsLimit));
        });

        return appBuilder.Build();
    }
}

Now, let’s take a look at the code of the prompt provider itself. There’s nothing particularly complex about it. Essentially, we’re providing the large language model with an instruction that defines the expected JSON format and specifies how the products should be returned.

This simple yet crucial setup ensures that the response from the model aligns with our expectations.

public class ProductSearchPromptProvider : IPromptProvider
{
    private readonly string _productSearchPrompt = 
        """
        Facts: 
        {{$facts}} 
        ======
        Based only on the facts above, return a list of the top {{$searchResultsLimit}} most relevant products based on the user's query below.
        Products may have the same name but different IDs, descriptions, or other attributes.
        Include all relevant details for each product and limit the results to a maximum of {{$searchResultsLimit}} items.
        Ensure the response strictly follows the JSON format specified below.
                                           
        Do not use Markdown formatting in the response, as it will be deserialized into JSON.
                                           
        Response format:
        [
            {
                "Id": "first product guid",
                "Name": "product name",
                "Description": "product description",
                "Price": price,
                "PriceCurrency": "currency code",
                "SupplyAbility": supply ability,
                "MinimumOrder": minimum order quantity
            },
            {
                "Id": "second product guid",
                "Name": "product name",
                "Description": "product description",
                "Price": price,
                "PriceCurrency": "currency code",
                "SupplyAbility": supply ability,
                "MinimumOrder": minimum order quantity
            }
            ...
        ]
                                           
        If no products are found or the user's query is invalid, return an empty JSON array.
                                           
        Reply with JSON only. No additional comments or explanations.
        User: {{$input}}
        Products:                     
        """;
        
    private readonly EmbeddedPromptProvider _fallbackProvider = new();

    public ProductSearchPromptProvider(int searchResultsLimit)
    {
        _productSearchPrompt = _productSearchPrompt.Replace(
            "{{$searchResultsLimit}}",
            searchResultsLimit.ToString(CultureInfo.InvariantCulture));
    }

    public string ReadPrompt(string promptName)
    {
        return promptName switch
        {
            Constants.PromptNamesAnswerWithFacts => _productSearchPrompt,
            _ => _fallbackProvider.ReadPrompt(promptName) // Fall back to the default
        };
    }
}

Prompt provider code

Next, let's take a look at the endpoint for ProductRagStreamingSearch. This is specifically a streaming endpoint that utilizes the SSE (Server-Sent Events) mechanism. The streaming approach allows our API to return results in chunks, improving the user experience.

For example, when you interact with ChatGPT, you don't wait for the entire message to be generated at once. Instead, it’s written out gradually. We adopt the same approach for our application—streaming the results incrementally as they're generated.

This technique ensures a smoother and faster experience for users, especially when dealing with large sets of data.

The MediatR library provides all the necessary tools to integrate the IAsyncEnumerable iterator, which is essential for implementing streaming, into our code.

public class ProductRagStreamingSearch : IEndpoint
{
    public void MapEndpoint(IEndpointRouteBuilder app)
    {
        app.MapGet("api/products/search/rag/streaming", async (
            string searchQuery,
            ISender sender,
            HttpResponse response,
            CancellationToken cancellationToken) =>
        {
            response.ContentType = "text/event-stream";

            var request = new ProductRagSearchStreamRequest(searchQuery);

            var resultStream = sender.CreateStream(request, cancellationToken);

            await foreach (var result in resultStream)
            {
                await response.WriteAsync(result, cancellationToken);
                await response.Body.FlushAsync(cancellationToken);
            }
        });
    }
}

Finally, let's examine the implementation of the ProductRagSearchStreamingRequestHandler. There's nothing complex here; we simply pass the IAsyncEnumerable iterator from above. Additionally, take note of the minRelevance parameter. The JSON rendering, which is the result of the large language model's response, is handled on the front-end of our application. You can explore the details of this implementation in the source code of the application if you'd like.

public sealed record ProductRagSearchStreamRequest(string SearchQuery) : IStreamRequest<string>;

public class ProductRagSearchStreamingRequestHandler(IKernelMemory memory, IOptions<ProductSearchOptions> options)
    : IStreamRequestHandler<ProductRagSearchStreamRequest, string>
{
    public async IAsyncEnumerable<string> Handle(
        ProductRagSearchStreamRequest request,
        [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        var answerStream = memory.AskStreamingAsync(
            request.SearchQuery,
            minRelevance: options.Value.MinSearchResultsRelevance,
            options: new SearchOptions { Stream = true },
            cancellationToken: cancellationToken);

        await foreach (var answer in answerStream)
        {
            yield return answer.Result;
        }
    }
}

In fact, the KernelMemory library encapsulates everything necessary for RAG, so the sequence diagram will be much more insightful.

Kernel Memory. RAG sequence diagram

Let's test the functionality of ProductRagStreamingSearch in the application’s UI:

Testing RAG streaming search endpoint

Part 7. Conclusion

In this article, we explored the implementation of semantic search in .NET using Kernel Memory and vector databases. We covered how to load data, build vectors with an embedding model, perform vector search, and enhance results through RAG.

It's important to note that while the examples in this article are simple, integrating both vector and full-text search into a production-level application can be complex. Working with embedding models and large language models (LLMs) involves either deploying models locally (e.g., via Ollama) or incurring costs with cloud-based models.

Key takeaways:

✅ Kernel Memory is a library that essentially serves as a large example for building multimodal AI applications with a focus on RAG. Whether you choose to use it or not is up to you, but we highly recommend familiarizing yourself with the practices it includes.

✅ RAG is powerful, but its implementation requires careful architectural planning. Response generation via LLMs can be time-consuming, which must be considered for user experience. Additionally, using LLMs can be costly.

✅ Hybrid search, combining vector and full-text search, can offer even better results. While we didn’t dive into this in the article, it's something we may revisit in the future.

At Yuniko Software, we encourage you to explore vector search and RAG if your project involves data mining or analysis. These approaches could be just what your application needs.

The source code for the project is available in our GitHub repository. We'd appreciate your feedback—feel free to leave comments here or on GitHub!

👉 Go to repository (https://github.com/Yuniko-Software/kernel-memory-ecommerce-sample)

Thank you for reading! 🚀