DEV Community: Vector Podcast

Novel idea in vector search: Wormhole vectors

Dmitry Kan — Fri, 07 Nov 2025 06:16:39 +0000

Vector Podcast episode: educational bit

Two weeks ago I had a pleasure to co-host a lightning session together with Trey Grainger on a novel idea in vector search called "Wormhole vectors".

The approach contrasts itself to a hybrid search approach.
In hybrid search you would convert the same input query into different representations (keywords -> embeddings), run independent queries, and then combine the results.

Wormhole vectors opens a new way to transcend vector spaces of different nature:

Query in the current vector space
Find a relevant document set
Derive a "wormhole" vector to a corresponding region of another vector space
Traverse to the other vector space with that query
Repeat as desired across multiple traversals

More specifically, if you come from a sparse space, taking a set of returned documents you can pool their embeddings into one embedding and use it as a "wormhole" into the dense vector space.
If as input you are dealing with a set of embeddings from a vector search, you can traverse the Semantic Knowledge Graph (SKG) to derive a sparse lexical query best representing these documents.

Recording on YouTube:

What you'll learn:

What are "Wormhole Vectors"?

Learn how wormhole vectors work & how to use them to traverse between disparate vector spaces for better hybrid search.

Building a behavioral vector space from click stream data

Learn to generate behavioral embeddings to be integrated with dense/semantic and sparse/lexical vector queries.

Traverse lexical, semantic, & behavioral vectors spaces

Jump back and forth between multiple dense and sparse vector spaces in the same query

Advanced hybrid search techniques (beyond fusion algorithms)

Hybrid search is more than mixing lexical + semantic search. See advanced techniques and where wormhole vectors fit in.

Find this episode on many popular platforms:
Spotify: https://open.spotify.com/episode/3fvfbAGQREqCJeUciyRb3r

Apple Podcasts: https://podcasts.apple.com/fi/podcast/trey-grainger-wormhole-vectors/id1587568733?i=1000735675696

Trey Grainger - Wormhole Vecto…–Vector Podcast – Apple Podcasts

Podcast Episode · Vector Podcast · 7 November 2025 · 1hr 19min

podcasts.apple.com

RSS: https://rss.com/podcasts/vector-podcast/2314900/

Trey Grainger - Wormhole Vectors | Podcast Episode on RSS.com

This lightning session introduces a new idea in vector search - Wormhole vectors!It has deep roots in physics and allows for transcending spaces of any nature: sparse, vector and behaviour (but could theoretically be any N-dimensional space).Craft decaf & half caf coffee, 25% discount: https://savorista.com/discount/VECTORBlog post on Medium: https://dmitry-kan.medium.com/novel-idea-in-vector-search-wormhole-vectors-6093910593b8Session page on maven: https://maven.com/p/8c7de9/beyond-hybrid-search-with-wormhole-vectors?utm_campaign=NzI2NzIx&utm_medium=ll_share_link&utm_source=instructorTo try the managed OpenSearch (multi-cloud, automatic backups, disaster recovery, vector search and more), go here: https://console.aiven.io/signup?utm_source=youtube&utm_medium&&utm_content=vectorpodcastGet credits to use Aiven's products (PG, Kafka, Valkey, OpenSearch, ClickHouse): https://aiven.io/startupsTimecodes:00:00 Intro by Dmitry01:48 Trey's presentation03:05 Walk to the AI-Powered Search course by Trey and Doug07:07 Intro to vector spaces and embeddings19:03 Disjoint vector spaces and the need of hybrid search23:11 Different modes of search24:49 Wormhole vectors47:49 Q&AWhat you'll learn:- What are "Wormhole Vectors"?Learn how wormhole vectors work & how to use them to traverse between disparate vector spaces for better hybrid search.- Building a behavioral vector space from click stream dataLearn to generate behavioral embeddings to be integrated with dense/semantic and sparse/lexical vector queries.- Traverse lexical, semantic, & behavioral vectors spacesJump back and forth between multiple dense and sparse vector spaces in the same query- Advanced hybrid search techniques (beyond fusion algorithms)Hybrid search is more than mixing lexical + semantic search. See advanced techniques and where wormhole vectors fit in.YouTube: https://www.youtube.com/watch?v=fvDC7nK-_C0

rss.com

Vector Podcast: Simon Eskildsen, Turbopuffer

Dmitry Kan — Fri, 19 Sep 2025 11:03:58 +0000

Vector database and search engine behind well known tools: Cursor, Notion, Linear, Superhuman, Readwise

Vector Podcast is back — Season 4 starts now!

I’ve had a pleasure to interview @sirupsen, the creator of Turbopuffer — vector database and a search engine. Prior to co-founding his startup, Simon has spent a decade at Shopify scaling the stateful engines, like Elasticsearch, Redis, MySQL at an immense scale (up to 1M reads / sec).

If you want to create a generational DB company, I think you need 2 things: a) you need a new workload — the new workload here is that we have almost every company on Earth, sitting on their treasure trove of data and they want to connect that to LLMs (large scale analytics on unstructured data) b) new storage architecture — if you don’t have a new storage architecture, that is fundamentally a better trade-off for the particular workload, then there is no reason why tacking on a secondary index to your relational database, to your OLAP, to your existing search engine [would not work] — they would eat it.

Simon Eskildsen on Vector Podcast with Dmitry Kan

Turbopuffer uses Object Storage (S3, GSC, Blob Storage) to store client data and vectors (encrypted using customer keys), and RAM and NVMe SSDs for cache of data your application actually uses. This is where the analogy to puffer fish finds its place: the databases inflates and deflates the cache, depending on the usage. To lower the latency towards Object Storage, Turbopuffer minimizes the number of roundtrips to 3–4 (400ms), and uses techniques like Range Fetch (available in AWS, GCP, Azure) to accelerate the cold read.

Turbopuffer architecture: https://turbopuffer.com/docs/architecture

For vector search, SPFresh algorithm is used, which is based on centroid index. This reminded me of the work I did as part of team Sisu implementing KANNDI algorithm for billion-scale ANN benchmarking competition. It is interesting, that centroid-based algorithms are natural for Object Storage as they minimize round-trips and allow for collocated writes. Looks like my intuition was right, and I’m glad someone has productized this!

All this to say, that the episode is full of insights, I’ve really enjoyed recording with Simon!

Spotify: https://open.spotify.com/episode/2DlO2W5QoLFVPvxwxyJaoN

Apple Podcasts: https://podcasts.apple.com/us/podcast/economical-way-of-serving-vector-search-workloads/id1587568733?i=1000727464303

RSS: https://rss.com/podcasts/vector-podcast/2222846/

Adding ML layer to Search: Hybrid Search Optimizer

Dmitry Kan — Sat, 12 Apr 2025 22:24:22 +0000

If we were to look 10 years out, I think an ideal solution is that we are not doing hybrid search anymore: just have a better approach. Something beyond vector + keyword, something better, that still supports the 0 results is the right answer (sometimes). We would have better approach, and not this slighly band-aidy approach, but for now hybrid search is exciting!

It is fascinating and funny how things develop, but also turn around. In 2022–23 everyone was buzzing about hybrid search. In 2024 the conversation shifted to RAG, RAG, RAG. And now we are in 2025 and back to hybrid search — on a different level. Finally, there are strides and contributions towards making hybrid search parameters learnt with ML. How cool is that?

When I looked at hybrid search, I instantly knew that fiddling with a and b in a*keyword + b*neural will be the crux of succeeding with this approach to search. I also knew, that a better way than manual tweaking will be applying ML.

I’m really happy someone clever did this. Daniel Wrigley and Eric Pugh, both from OpenSource Connections, decided to do exactly that: apply machine learning to the problem of computing these coefficients. In other words: what weight to give to keyword match vs neural search match. And what’s fascinating, is that they experimented with a multitude of methods from global to dynamic (per query), with different permutations, feature groups, combination methods, query sampling — it sounds an overwhelming study.

What’s even cooler, is that all of this is open source.

Check out this episode, let me (us) know what you think. And remember to subscribe to stay tuned for new episodes.

Design: Saurabh Rai, https://www.linkedin.com/in/srbhr/

The design of this episode is inspired by a scene in Blade Runner 2049. There’s a clear path leading towards where people want to go to, yet they’re searching for something.

As usual, you can find the episode in audio form on your favorite platform.

7 AI Open Source Libraries To Build RAG, Agents & AI Search

𝚂𝚊𝚞𝚛𝚊𝚋𝚑 𝚁𝚊𝚒 — Thu, 14 Nov 2024 21:08:14 +0000

What is Retrieval Augmented Generation (RAG)?

Retrieval Augmented Generation (RAG) is an AI technique that combines searching for relevant information with generating responses. It works by first retrieving data from external sources (like documents or databases) and then using this information to create more accurate and context-aware answers. This helps the AI provide better, fact-based responses rather than relying solely on what it was trained on.

How does Retrieval Augmented Generation (RAG) Works?

RAG (Retrieval-Augmented Generation) works by enhancing AI responses with relevant information from external sources. Here’s a concise explanation:

When a user asks a question, RAG searches through various data sources (like databases, websites, and documents) to find relevant information.
It then combines this retrieved information with the original question to create a more informed prompt.
This enhanced prompt is fed into a language model, which generates a response that’s both relevant to the question and enriched with the retrieved information. This process allows the AI to provide more accurate, up-to-date, and context-aware answers by leveraging external knowledge sources alongside its pre-trained capabilities.

How does Retrieval Augmented Generation (RAG) helps the AI Model?

RAG makes the AI more reliable and up-to-date by augmenting its internal knowledge with real-world, external data. RAG also improves an AI model in a few key ways:

Access to Up-to-Date Information: RAG retrieves relevant, real-time information from external sources (like documents, databases, or the web). This means the AI can provide accurate responses even when its training data is outdated.
Enhanced Accuracy: Instead of relying solely on the AI’s trained knowledge, RAG ensures the model generates responses based on the most relevant data. This makes the answers more accurate and grounded in facts.
Better Contextual Understanding: By combining retrieved data with a user’s query, RAG can offer answers that are more context-aware, making the AI’s responses feel more tailored and specific to the situation.
Reduced Hallucination: Pure AI models sometimes "hallucinate" or make up information. RAG mitigates this by grounding responses in factual, retrieved data, reducing the likelihood of inaccurate or fabricated information.

7 Open Source Libraries to do Retrieval Augmented Generation

Let's explore some open-source libraries helping you do RAG. These libraries provide the tools and frameworks necessary to implement RAG systems efficiently, from document indexing to retrieval and integration with language models.

1. SWIRL

SWIRL is an open-source AI infrastructure software that powers Retrieval-Augmented Generation (RAG) applications. It enhances AI pipelines by enabling fast and secure searches across data sources without moving or copying data. SWIRL works inside your firewall, ensuring data security while being easy to implement.

What makes it unique:

No ETL or data movement required.
Fast and secure AI deployment inside private clouds.
Seamless integration with over 20+ large language models (LLMs).
Built for secure data access and compliance.
Supports data fetching from 100+ applications.

⭐️ SWIRL on GitHub

2. Cognita

Cognita is an open-source framework for building modular, production-ready Retrieval Augmented Generation (RAG) systems. It organizes RAG components, making it easier to test locally and deploy at scale. It supports various document retrievers, embeddings, and is fully API-driven, allowing seamless integration into other systems.

What makes it unique:

Modular design for scalable RAG systems.
UI for non-technical users to interact with documents and Q&A.
Incremental indexing reduces compute load by tracking changes.

⭐️ Cognita on GitHub

3. LLM-Ware

LLM Ware is an open-source framework for building enterprise-ready Retrieval Augmented Generation (RAG) pipelines. It is designed to integrate small, specialized models that can be deployed privately and securely, making it suitable for complex enterprise workflows.

What makes it unique:

Offers 50+ fine-tuned, small models optimized for enterprise tasks.
Supports a modular and scalable RAG architecture.
Can run without a GPU, enabling lightweight deployments.

⭐️ LLMWare on GitHub

4. RAG Flow

RagFlow is an open-source engine focused on Retrieval Augmented Generation (RAG) using deep document understanding. It allows users to integrate structured and unstructured data for effective, citation-grounded question-answering. The system offers scalable and modular architecture with easy deployment options.

What makes it unique:

Built-in deep document understanding to handle complex data formats.
Grounded citations with reduced hallucination risks.
Support for various document types like PDFs, images, and structured data.

⭐️ RAG Flow on GitHub

5. Graph RAG

GraphRAG is a modular, graph-based Retrieval-Augmented Generation (RAG) system designed to enhance LLM outputs by incorporating structured knowledge graphs. It supports advanced reasoning with private data, making it ideal for enterprises and research applications.

What makes it unique:

Uses knowledge graphs to structure and enhance data retrieval.
Tailored for complex enterprise use cases requiring private data handling.
Supports integration with Microsoft Azure for large-scale deployments.

🌟 Graph RAG on GitHub

6. Haystack

Haystack is an open-source AI orchestration framework for building production-ready LLM applications. It allows users to connect models, vector databases, and file converters to create advanced systems like RAG, question answering, and semantic search.

What makes it unique:

Flexible pipelines for retrieval, embedding, and inference tasks.
Supports integration with a variety of vector databases and LLMs.
Customizable with both off-the-shelf and fine-tuned models.

🌟 Haystack on GitHub

7. Storm

STORM is an LLM-powered knowledge curation system that researches a topic and generates full-length reports with citations. It integrates advanced retrieval methods and supports multi-perspective question-asking, enhancing the depth and accuracy of the generated content.

What makes it unique:

Generates Wikipedia-like articles with grounded citations.
Supports collaborative human-AI knowledge curation.
Modular design with support for external retrieval sources.

🌟 Storm on GitHub

Challenges in Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) faces challenges like ensuring data relevance, managing latency, and maintaining data quality. Some challenges are:

Data relevance: Ensuring the retrieved documents are highly relevant to the query can be difficult, especially with large or noisy datasets.
Latency: Searching external sources adds overhead, potentially slowing down response times, especially in real-time applications.
Data quality: Low-quality or outdated data can lead to inaccurate or misleading AI-generated responses.
Scalability: Handling large-scale datasets and high user traffic while maintaining performance can be complex.
Security: Ensuring data privacy and handling sensitive information securely is crucial, especially in enterprise settings.

Platforms like SWIRL tackle these issues by not requiring ETL (Extract, Transform, Load) or data movement, ensuring faster and more secure access to data.
With SWIRL, the retrieval and processing happen inside the user’s firewall, which helps maintain data privacy while ensuring relevant, high-quality responses. Its integration with existing large language models (LLMs) and enterprise data sources makes it an efficient solution for overcoming the latency and security challenges of RAG.

Thank you for reading 💜

Thank you for reading my post and do take a look at these wonderful libraries. Share the post if you want to. I write about AI, open source tools, Resume Matcher and more.

These are my handles where you can reach out to me:

Follow me on DEV

Connect with me on LinkedIn

Follow me on GitHub

For collaborations send me an email at: srbh077@gmail.com

Vector Podcast from Berlin Buzzwords’24: Sonam Pankaj, EmbedAnything

Dmitry Kan — Fri, 20 Sep 2024 11:37:37 +0000

I’ve just released an episode with Sonam Pankaj. She works on EmbedAnything. We have recorded this episode at Berlin Buzzwords back in June, where I also got the chance to test my new audio recording gear (RØDE Wireless GO II).

EmbedAnything is an infrastructure layer, that allows you to embed anything (different text formats, but also other modalities, like audio), written in Rust for performance reasons. It can embed a pdf text 40x faster than in Python.

We spoke about this project, but also about metric learning, quality assurance and multimodality.

There are a bunch of show notes with different papers and projects — do check them out.

Find the episode on these platforms in addition to YouTube:

RSS: https://rss.com/podcasts/vector-podcast/1663042/

Spotify: https://open.spotify.com/episode/5pUWz19iWKHqUzNT0JQ9KL

Apple Podcasts: https://podcasts.apple.com/fi/podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything/id1587568733?i=1000670040161

Patreon: https://www.patreon.com/posts/vector-podcast-112350470?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link

Big thanks to @srbhr for designing the thumbnail of this episode.

These 8 Podcasts will help increase your knowledge and expand your mindset.

𝚂𝚊𝚞𝚛𝚊𝚋𝚑 𝚁𝚊𝚒 — Thu, 16 Nov 2023 12:10:29 +0000

I’m a big fan of podcasts, the most underrated pure knowledge resources. Podcasts have emerged as a revolutionary medium in learning and entertainment. In recent years, they have transcended from being a niche listener’s hobby to a mainstream media format.

Learning from two or more experienced people talking about a specific topic, discussing challenges they faced while doing something, and sharing their journey is fascinating.

It gives incredible insights into tackling specific problems and building unique solutions. You get to know about:

Different mindsets and how other people think about things.
The diverse array of tools that people use to solve problems.
Prominent people and their stories and how they solve a particular problem.
Come out of the mental bubble and start to see things differently.

The other good thing is that the knowledge you acquire by listening to these podcasts becomes your latent knowledge or tacit knowledge reservoir.

So, when you face challenges in a similar domain, the knowledge you gain from podcasts can give you a moment or a roadmap for building a solution to your challenge.

And as an AI enthusiast and working in the knowledge and data domain. I picked up Vector Podcast because I was curious about vector databases and vector search, how they are helping companies establish artificial intelligence-based products, etc.

Here are my top 8 podcast recommendations you can listen to. I’m giving their YouTube channels, assuming everyone knows and uses YouTube. Most of them are also on Apple Podcasts, Spotify, etc.

Vector Podcast

Vector Podcast is the brainchild of Dmitry Kan—a Ph.D. Graduate Research Scientist venturing into Product Management & entrepreneurship. He also used to lecture at Helsinki University.

With over 15 years of experience in the Search Engine Domain, Artificial Intelligence, and Software Development domains, he interviews professionals and CEOs in the field of AI.

Some of the notable episodes are:

Searching with Swirl along with Sid Probstein, CEO of Swirl.
Vector Databases along with Bob van Luijt, CEO of Weviate.
Future of Search with Connor Shorten.

Subscribe to Vector Podcast

Lex Fridman

Lex Fridman is a popular figure, and because of this, he can host many famous figures, CEOs, leaders, and top-notch researchers. Bringing them to the mic-table and asking them questions (he’s pretty good at that.)

Some notable people he interviewed were Sam Altman, Elon Musk, Guido van Rossum (Creator of Python Language), Stephen Wolfram, and a Goose.

He’s also a Machine Learning Instructor, and you can view them in this playlist.

The Knowledge Project

Hosted by Shane Parrish of Farnam Street, The Knowledge Project Podcast uncovers the best of what others have already figured out so you can apply their insights to your life.

The podcast and the insights given by Shane Parrish are undoubtedly breathtaking. You have to experience the knowledge project to understand it well.

Syntax

Comes from Wes Bos of 30 Days of JavaScript. This podcast talks about JavaScript and Web Development.
It covers some fascinating topics ranging from newer features in web development to JavaScript testing and other topics.

This Week in Machine Learning (TWIML)

Join Sam as he explores the latest trends and breakthroughs in AI, discusses the practical challenges of bringing AI-powered products to market, and examines the intersection of AI technology with business and consumer applications.

This podcast is more than just a conversation; it's a gateway to understanding and leveraging the full potential of machine learning and AI to enhance our lives and communities. And I like this podcast a lot!

The Changelog

Changelog is more than just a collection of podcasts; it's a dynamic community and a rich resource for developers at all stages of their journey.

Whether you're a seasoned pro, a curious beginner, or somewhere in between, our shows and resources offer valuable insights, discussions, and stories that resonate with the developer experience.

Talk Python

Talk Python to Me is a weekly podcast hosted by Michael Kennedy. The show covers various Python and related topics (e.g., MongoDB, AngularJS, DevOps).

Beyond Coding

Patrick Akil and his guests share their journeys and perspectives for you to take with you and form your own.

Beyond Coding is a weekly podcast with conversations that go "beyond coding" in a fireside chat format. Typical topics are software engineering, leadership, communication, self-improvement, and happiness.

💡 Fun fact:
Do you know that “podcast” comes from the blend of iPod and broadcast?

Conclusion

I hope you liked the lists of podcasts featured in this article. If you have any suggestions, feel free to share them in the comments.

Checkout Vector Podcast on YouTube
We’re trying to provide you with the best-in-class content on AI, Large Language Models, and more.

Subscribe to Vector Podcast

Thanks for reading.