High-fidelity HTML-to-Markdown converter with batch processing and site crawling.
npm install
./bin/html2md https://example.com- Interactive CLI — menu-driven terminal UI, no flags to memorize
- Advanced Stealth — bypasses WAFs/Cloudflare using Puppeteer Extra Stealth & rotated UAs
- Smart URL Normalization — smoothly handles redirects and
www.subdomains internally - Single URL — convert any web page to Markdown with near-100% fidelity
- Batch mode — convert dozens of URLs in parallel
- Crawl mode — discover all pages on a domain and convert them all
- Tree view — visualize site structure without converting
- JS rendering — Puppeteer handles React, Next.js, Angular, and other SPAs
- Smart extraction — Readability.js primary, Python Trafilatura fallback
- Image downloading — downloads and rewrites image paths
- GFM support — tables, strikethrough, task lists, fenced code blocks
- YAML front matter — optional metadata header
- Export Formats — Download site maps in
.zip,.txt,.md, or.jsonformats - REST API — Full suite including
/api/convert,/api/batch,/api/crawl,/api/file2md,/api/agentify, and extensive/api/jobsmanagement. Features active keep-alive architecture & cross-tab process cancellation. - Path-Based Web UI — A fast, SPA frontend with clean, SEO-friendly routes (
/crawl,/map,/agentify,/file2md) for easy bookmarking and AI crawler discovery. - Agentify Pipeline — Convert entire websites into structured, AI-ready Skill Bundles via the Web UI (
/agentify) or REST API. - URL-Prepend Shortcut — Instantly parse any page or media file by prepending
https://2md.traylinx.com/to the URL. - Async & Email Notifications — Fire-and-forget large batch crawls and receive a branded email with a secure, 72-hour ZIP download link.
- Agentic Uploads (File2MD) — Upload PDFs, images, and audio/video media (MP4, MP3, WAV, YouTube) via the Web UI (
/file2md) for conversion using Vision models and intelligent Whisper transcription extraction. - NDJSON Streaming — Stream real-time progress events from the API or CLI using
--stream. - AI Agent Discovery —
/llms.txtand/llms-full.txtlet AI agents auto-discover and use the API (spec), alongside/skills/:skillName.mdfor bundle retrieval.
# Interactive mode (guided menu)
npm run interactive
# Single URL
./bin/html2md https://example.com
# Batch (file with one URL per line)
./bin/html2md --batch urls.txt
# Crawl entire site
./bin/html2md --crawl https://docs.example.com --depth 3
# View site tree only
./bin/html2md --crawl https://docs.example.com --tree-only
# Start API server
npm start| Component | Tool |
|---|---|
| Page rendering | Puppeteer (headless Chrome) |
| Content extraction | Readability.js + Trafilatura (Python fallback) |
| HTML → Markdown | Turndown.js + GFM plugin |
| API server | Express.js |
| Concurrency | p-queue |
| Container | Docker (Node 22 + Chrome + Python 3) |
MIT