feat(core,community): ssrf hardening#9990
Conversation
Port the Python SSRF protection module to TypeScript. Includes: - IP validation against private ranges (RFC 1918, loopback, link-local) - Cloud metadata endpoint detection (AWS, GCP, Azure) - Localhost detection (including 127.x range) - Safe URL validation with DNS resolution - Comprehensive test suite (42 tests) Functions exported: - isPrivateIp: Check if IP is in private ranges - isCloudMetadata: Detect cloud metadata endpoints - isLocalhost: Detect localhost variations - validateSafeUrl: Validate URLs are safe to connect to - isSafeUrl: Non-throwing URL safety check Cloud metadata is always blocked, even with allowPrivate flag. Uses Node.js dns/promises for DNS resolution.
Replace unsafe string-based URL origin checking with semantic URL comparison to prevent SSRF bypasses via crafted subdomains. The preventOutside option now correctly uses URL.origin comparison instead of String.startsWith(), which prevents bypasses like 'https://example.com.attacker.com'. - Use semantic URL parsing for origin comparison in RecursiveUrlLoader.getChildLinks - Add private isSameOrigin() helper method for safe URL origin validation - Add comprehensive unit tests for SSRF vulnerability detection - Fixes SSRF advisory: startsWith() check vulnerable to subdomain-based bypasses Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Move src/_security/ssrf_protection.ts to src/utils/ssrf_protection.ts - Move src/_security/tests/ssrf_protection.test.ts to src/utils/tests/ssrf_protection.test.ts - Remove _security/ directory entirely - Add ./utils/ssrf_protection export to package.json - Update tsdown.config.ts to include ssrf_protection in build - Auto-generated import_map.ts update includes ssrf_protection export All 42 tests pass. Lint and build successful. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Rename src/utils/ssrf_protection.ts to src/utils/ssrf.ts - Rename test file src/utils/tests/ssrf_protection.test.ts to src/utils/tests/ssrf.test.ts - Update import in test file to reference ../ssrf.js - Update package.json export key from ./utils/ssrf_protection to ./utils/ssrf - Update input path in package.json from ./src/utils/ssrf_protection.ts to ./src/utils/ssrf.ts - Update tsdown.config.ts entry from ./src/utils/ssrf_protection.ts to ./src/utils/ssrf.ts - Update import_map.ts export from utils__ssrf_protection to utils__ssrf All tests pass (42 tests), lint passes, and build completes successfully.
…Loader - Add isSameOrigin export to @langchain/core/utils/ssrf.ts for checking URL origin equality - Add comprehensive tests for isSameOrigin covering same origin, different schemes/hosts/ports, invalid URLs, and subdomains - Update RecursiveUrlLoader to import and use shared isSameOrigin and validateSafeUrl utilities - Remove private isSameOrigin method from RecursiveUrlLoader (now uses shared implementation) - Add SSRF validation before all fetch calls in getUrlAsDoc and getChildUrlsRecursive with allowHttp: true flag - All tests pass, lint passes, builds succeed for both @langchain/core and @langchain/community
…1.1.21 RecursiveUrlLoader now imports from @langchain/core/utils/ssrf, which is available in v1.1.20+. Update the peer dependency lower bound to ensure users cannot install an incompatible version of @langchain/core. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- feat(core): Add SSRF protection module - fix(community): Harden RecursiveUrlLoader against SSRF attacks
🦋 Changeset detectedLatest commit: 812884e The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
- Replace dns/promises and net imports with pure JS implementations - Add IPv4/IPv6 validation using regex and string parsing (no Node.js deps) - Remove net.Socket usage from expandIpv6 - Use dynamic import for DNS resolution with graceful fallback for non-Node environments - All 50 SSRF tests pass - Module now works in browsers, Cloudflare Workers, Deno, Vercel Edge, etc. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… resolution - Removed resolveDns function and its Node.js-only dynamic import - Changed validateSafeUrl from async to sync (Promise<string> -> string) - Changed isSafeUrl from async to sync (Promise<boolean> -> boolean) - Updated all test cases to remove 'await' keywords - Updated RecursiveUrlLoader to remove 'await' before validateSafeUrl call - Functions now perform only static hostname/IP validation without DNS lookups - Still blocks private IPs, cloud metadata endpoints, and localhost as before - Works across all environments (Node.js, browsers, workers) without DNS Tests: 50 passing Build: ✓ @langchain/core ✓ @langchain/community
Christian Bromann (christian-bromann)
left a comment
There was a problem hiding this comment.
Thanks for tackling this important security issue! This is an excellent implementation of SSRF protection that addresses critical vulnerabilities in the RecursiveUrlLoader. The new utilities module is comprehensive and well-tested. LGTM 👍
|
|
||
| const prefixLen = parseInt(prefixStr, 10); | ||
| if (isNaN(prefixLen)) { | ||
| return null; |
There was a problem hiding this comment.
Consider adding validation for the prefixLen parameter to ensure it's non-negative:
| return null; | |
| const prefixLen = parseInt(prefixStr, 10); | |
| if (isNaN(prefixLen) || prefixLen < 0) { | |
| return null; | |
| } |
| private async getUrlAsDoc(url: string): Promise<Document | null> { | ||
| let res; | ||
| try { | ||
| validateSafeUrl(url, { allowHttp: true }); |
There was a problem hiding this comment.
The validateSafeUrl call is synchronous but placed before an async operation. Consider awaiting it for consistency with the other call on line 174:
| validateSafeUrl(url, { allowHttp: true }); | |
| await validateSafeUrl(url, { allowHttp: true }); |
| @@ -0,0 +1,83 @@ | |||
| import { test, describe, expect } from "@jest/globals"; | |||
| import { RecursiveUrlLoader } from "../web/recursive_url.js"; | |||
There was a problem hiding this comment.
Consider adding an integration test that actually instantiates the loader and verifies the SSRF protection works end-to-end, rather than just testing the logic in isolation.
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Summary
New exports