API Documentation to RAG Knowledge Base

This is an AI agent that browses API documentation for you. Give it a URL, and it will crawl the site, extract the useful bits, and store them in a knowledge base (pgvector) that you can query later.

Think of it as a "read this for me" button for API docs. It uses OpenAI's computer use model to navigate and standard embedding models to make the content searchable. Perfect for feeding context into LLMs or building smarter developer tools.

How it works

Browser Agent: A headless browser (Playwright) controlled by AI navigates the documentation.
Extraction: It scrapes pages and converts them into structured data.
Embeddings: Text is turned into vectors and stored in PostgreSQL.
Search: You get an API to find relevant docs by meaning, not just keywords.

┌─────────────────┐
│  Browser Agent  │ (CUA + Playwright)
└────────┬────────┘
         │
         ↓
┌─────────────────┐
│  Extractions    │ (Structured curls and documentation extraction)
└────────┬────────┘
         │
         ↓
┌─────────────────┐
│  Embeddings     │ (Text-embedding-ada-002)
└────────┬────────┘
         │
         ↓
┌─────────────────┐
│  PostgreSQL +   │ (Vector storage with pgvector)
│    pgvector     │
└─────────────────┘

Quick Start

You'll need Node.js, a PostgreSQL instance with pgvector extension, Redis, and an OpenAI API key.

1. Setup

# Install dependencies
yarn install

# Setup environment
cp .env.example .env
# ... fill in your OPENAI_API_KEY, DATABASE_URL, and REDIS connection details

2. Database & Workers

# Enable vector extension & run migrations
yarn db:setup
yarn db:migrate

# Start the server and the background workers (keep this terminal open!)
yarn start:all

3. Run it

# Start the backend and frontend
yarn dev

Open http://localhost:5173 to see the UI.

Usage

Generate Knowledge Base POST to /knowledge-base with { "url": "https://docs.example.com" } to start crawling.

Search GET /api/search?query=auth&url=... to find what you need.

RAG / Agent Context Use the search endpoint as a tool for your own AI agents. For example, you can expose it as an MCP (Model Context Protocol) server to give your IDE or chat agent direct access to this knowledge.

Example Use Case

This can be used as a RAG database to provide an in-memory context to an LLM, or the APIs can be exposed as a tool in an MCP Server.

This example shows how you can use Postman's Agent Mode to generate an MCP server from these APIs and use that MCP server to provide additional context to Agent Mode.

Step 1: Fork the collection

Step 2: Generate an MCP Server using Postman's MCP Server Generator.

Step 3: Connect Agent Mode to the generated MCP Server

Step 4: Prompt Agent mode and watch it use its tools to query its knowledge base

Watch a Demo Here

Contributing

If you want to add a feature or fix a bug, feel free to open a PR. We use yarn workspaces, so make sure you run commands in the root or specify the workspace.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
client		client
server		server
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API Documentation to RAG Knowledge Base

How it works

Quick Start

1. Setup

2. Database & Workers

3. Run it

Usage

Example Use Case

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Postman-Devrel/api-doc-to-rag

Folders and files

Latest commit

History

Repository files navigation

API Documentation to RAG Knowledge Base

How it works

Quick Start

1. Setup

2. Database & Workers

3. Run it

Usage

Example Use Case

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages