Fast CLI for summarizing anything you can point at:
- Web pages (article extraction; Firecrawl fallback if sites block agents)
- YouTube links (best-effort transcripts, optional Apify fallback)
- Podcast RSS feeds (best-effort: transcribes latest enclosure via Whisper when configured)
- Remote files (PDFs/images/audio/video via URL — downloaded and forwarded to the model)
- Local files (PDFs/images/audio/video/text — forwarded or inlined; support depends on provider/model)
It streams output by default on TTY and renders Markdown to ANSI (via markdansi). At the end it prints a single “Finished in …” line with timing, token usage, and a best-effort cost estimate (when pricing is available).
Requires Node 22+.
- npx (no install):
npx -y @steipete/summarize "https://example.com"- npm (global install):
npm i -g @steipete/summarize- Homebrew (custom tap):
brew install steipete/tap/summarizeApple Silicon only (arm64).
summarize "https://example.com"Input can be a URL or a local file path:
npx -y @steipete/summarize "/path/to/file.pdf" --model google/gemini-3-flash-preview
npx -y @steipete/summarize "/path/to/image.jpeg" --model google/gemini-3-flash-previewRemote file URLs work the same (best-effort; the file is downloaded and passed to the model):
npx -y @steipete/summarize "https://example.com/report.pdf" --model google/gemini-3-flash-previewYouTube (supports youtube.com and youtu.be):
npx -y @steipete/summarize "https://youtu.be/dQw4w9WgXcQ" --youtube autoPodcast RSS feed (transcribes latest episode enclosure):
npx -y @steipete/summarize "https://feeds.npr.org/500005/podcast.xml"Apple Podcasts episode page (extracts stream URL, transcribes via Whisper):
npx -y @steipete/summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432"Spotify episode page (best-effort; resolves to full episode via iTunes/RSS enclosure when available — not preview clips; may fail for Spotify-exclusive shows):
npx -y @steipete/summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY"This is “best effort” and depends on what your selected model/provider accepts. In practice these usually work well:
text/*and common structured text (.txt,.md,.json,.yaml,.xml, …)- text-like files are inlined into the prompt (instead of attached as a file part) for better provider compatibility
- PDFs:
application/pdf(provider support varies; Google is the most reliable in this CLI right now) - Images:
image/jpeg,image/png,image/webp,image/gif - Audio/Video:
audio/*,video/*(when supported by the model)
Notes:
- If a provider rejects a media type, the CLI fails fast with a friendly message (no “mystery stack traces”).
- xAI models currently don’t support attaching generic files (like PDFs) via the AI SDK; use a Google/OpenAI/Anthropic model for those.
Use “gateway-style” ids: <provider>/<model>.
Examples:
openai/gpt-5-minianthropic/claude-sonnet-4-5xai/grok-4-fast-non-reasoninggoogle/gemini-3-flash-previewzai/glm-4.7openrouter/openai/gpt-5-mini(force OpenRouter)
Note: some models/providers don’t support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).
--length controls how much output we ask for (guideline), not a hard truncation.
npx -y @steipete/summarize "https://example.com" --length long
npx -y @steipete/summarize "https://example.com" --length 20k- Presets:
short|medium|long|xl|xxl - Character targets:
1500,20k,20000 - Optional hard cap:
--max-output-tokens <count>(e.g.2000,2k)- Provider/model APIs still enforce their own maximum output limits.
- If omitted, no max token parameter is sent (provider default).
- Prefer
--lengthunless you need a hard cap (some providers count “reasoning” into the cap).
- Minimums:
--lengthnumeric values must be ≥ 50 chars;--max-output-tokensmust be ≥ 16.
- Text inputs over 10 MB are rejected before tokenization.
- Text prompts are preflighted against the model’s input limit (LiteLLM catalog), using a GPT tokenizer.
npx -y @steipete/summarize <input> [flags]--model <provider/model>: which model to use (defaults toauto)--model auto: automatic model selection + fallback (default)--model <name>: use a config-defined model (see “Configuration”)--language <lang>/--lang <lang>: output language (defaultauto= same as source content)--timeout <duration>:30s,2m,5000ms(default2m)--retries <count>: LLM retry attempts on timeout (default1)--length short|medium|long|xl|xxl|<chars>--language, --lang <language>: output language (auto= match source; oren,de,english,german, ...)--max-output-tokens <count>: hard cap for LLM output tokens (optional; only sent when set)--cli [provider]: use a CLI provider (case-insensitive; equivalent to--model cli/<provider>). If omitted, uses auto selection with CLI enabled.--stream auto|on|off: stream LLM output (auto= TTY only; disabled in--jsonmode)--render auto|md-live|md|plain: Markdown rendering (auto= best default for TTY)--format md|text: website/file content format (defaulttext)--preprocess off|auto|always: controlsuvx markitdownusage (defaultauto;alwaysforces file preprocessing)- Install
uvx:brew install uv(or https://astral.sh/uv/)
- Install
--extract: print extracted content and exit (no summary) — only for URLs- Deprecated alias:
--extract-only
- Deprecated alias:
--json: machine-readable output with diagnostics, prompt,metrics, and optional summary--verbose: debug/diagnostics on stderr--metrics off|on|detailed: metrics output (defaulton;detailedadds a compact 2nd-line breakdown on stderr)
--model auto builds candidate attempts from built-in rules (or your model.rules overrides).
CLI tools are not used in auto mode unless you explicitly enable them via cli.enabled in config.
Why: CLI adds ~4s latency per attempt and higher variance.
Shortcut: --cli (with no provider) uses auto selection with CLI enabled.
When enabled, auto prepends CLI attempts in the order listed in cli.enabled
(recommended: ["gemini"]), then tries the native provider candidates
(with OpenRouter fallbacks when configured).
Enable CLI attempts:
{
"cli": { "enabled": ["gemini"] }
}Disable CLI attempts:
{
"cli": { "enabled": [] }
}Note: when cli.enabled is set, it’s also an allowlist for explicit --cli / --model cli/....
Non-YouTube URLs go through a “fetch → extract” pipeline. When the direct fetch/extraction is blocked or too thin, --firecrawl auto can fall back to Firecrawl (if configured).
--firecrawl off|auto|always(defaultauto)--extract --format md|text(defaulttext)--markdown-mode off|auto|llm(defaultauto; only affects--format mdfor non-YouTube URLs)auto: use an LLM converter when configured; may fall back touvx markitdownllm: force LLM conversion (requires a configured model key)off: disable LLM conversion (still may return Firecrawl Markdown when configured)
- Plain-text mode: use
--format text.
--youtube auto tries best-effort web transcript endpoints first. When captions aren't available, it falls back to:
- Apify (if
APIFY_API_TOKENis set): Uses a scraping actor (faVsWy9VTSNVIhWpR) - yt-dlp + Whisper (if
YT_DLP_PATHis set): Downloads audio via yt-dlp, transcribes with localwhisper.cppwhen installed (preferred), otherwise falls back to OpenAI (OPENAI_API_KEY) or FAL (FAL_KEY)
Environment variables for yt-dlp mode:
YT_DLP_PATH- path to yt-dlp binarySUMMARIZE_WHISPER_CPP_MODEL_PATH- optional override for the localwhisper.cppmodel fileSUMMARIZE_WHISPER_CPP_BINARY- optional override for the local binary (default:whisper-cli)SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1- disable local whisper.cpp (force remote)OPENAI_API_KEY- OpenAI Whisper transcriptionFAL_KEY- FAL AI Whisper fallback
Apify costs money but tends to be more reliable when captions exist.
--video-mode transcript forces audio/video inputs (local files or direct media URLs) through Whisper first, then summarizes the transcript text. Prefers local whisper.cpp when available; otherwise requires OPENAI_API_KEY or FAL_KEY.
Single config location:
~/.summarize/config.json
Supported keys today:
{
"model": { "id": "openai/gpt-5-mini" }
}Shorthand (equivalent):
{
"model": "openai/gpt-5-mini"
}Also supported:
model: { "mode": "auto" }(automatic model selection + fallback; seedocs/model-auto.md)model.rules(customize candidates / ordering)models(define presets selectable via--model <preset>)media.videoMode: "auto"|"transcript"|"understand"openai.useChatCompletions: true(force OpenAI-compatible chat completions)
Note: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.
Precedence:
--modelSUMMARIZE_MODEL~/.summarize/config.json- default (
auto)
Set the key matching your chosen --model:
OPENAI_API_KEY(foropenai/...)ANTHROPIC_API_KEY(foranthropic/...)XAI_API_KEY(forxai/...)Z_AI_API_KEY(forzai/...; supportsZAI_API_KEYalias)GEMINI_API_KEY(forgoogle/...)- also accepts
GOOGLE_GENERATIVE_AI_API_KEYandGOOGLE_API_KEYas aliases
- also accepts
OpenAI-compatible chat completions toggle:
OPENAI_USE_CHAT_COMPLETIONS=1(or setopenai.useChatCompletionsin config)
OpenRouter (OpenAI-compatible):
- Set
OPENROUTER_API_KEY=... - Prefer forcing OpenRouter per model id:
--model openrouter/<author>/<slug>(e.g.openrouter/meta-llama/llama-3.1-8b-instruct:free) - Built-in preset:
--model free(uses a default set of OpenRouter:freemodels).
Quick start: make free the default (keep auto available)
# writes ~/.summarize/config.json (models.free) and sets model="free"
summarize refresh-free --set-default
# now this defaults to free models
summarize "https://example.com"
# whenever you want best quality instead
summarize "https://example.com" --model autoRegenerates the free preset (writes models.free into ~/.summarize/config.json) by:
- Fetching OpenRouter
/models, filtering:free - Skipping models that look very small (<27B by default) based on the model id/name (best-effort heuristic)
- Testing which ones return non-empty text (concurrency 4, timeout 10s)
- Picking a mix of “smart-ish” (bigger
context_length/ output cap) and fast models - Refining timings for the final selection and writing the sorted list back
If --model free stops working (rate limits, allowed-provider restrictions, models removed), run:
summarize refresh-freeFlags:
--runs 2(default): extra timing runs per selected model (total runs = 1 + runs)--smart 3(default): how many “smart-first” picks (rest filled by fastest)--min-params 27b(default): ignore models with inferred size smaller than N billion parameters--max-age-days 180(default): ignore models older than N days (set 0 to disable)--set-default: also sets"model": "free"in~/.summarize/config.json
Example:
OPENROUTER_API_KEY=sk-or-... summarize "https://example.com" --model openrouter/meta-llama/llama-3.1-8b-instruct:freeIf your OpenRouter account enforces an allowed-provider list, make sure at least one provider
is allowed for the selected model. (When routing fails, summarize prints the exact providers to allow.)
Legacy: OPENAI_BASE_URL=https://openrouter.ai/api/v1 (and either OPENAI_API_KEY or OPENROUTER_API_KEY) also works.
Z.AI (OpenAI-compatible):
Z_AI_API_KEY=...(orZAI_API_KEY=...)- Optional base URL override:
Z_AI_BASE_URL=...
Optional services:
FIRECRAWL_API_KEY(website extraction fallback)YT_DLP_PATH(path to yt-dlp binary for audio extraction)FAL_KEY(FAL AI API key for audio transcription via Whisper)APIFY_API_TOKEN(YouTube transcript fallback)
The CLI uses the LiteLLM model catalog for model limits (like max output tokens):
- Downloaded from:
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json - Cached at:
~/.summarize/cache/
This package also exports a small library:
@steipete/summarize/content@steipete/summarize/prompts
pnpm install
pnpm check