Skip to content

feat(core,community): ssrf hardening#9990

Merged
Hunter Lovell (hntrl) merged 13 commits intomainfrom
hunter/ssrf
Feb 11, 2026
Merged

feat(core,community): ssrf hardening#9990
Hunter Lovell (hntrl) merged 13 commits intomainfrom
hunter/ssrf

Conversation

@hntrl
Copy link
Copy Markdown
Member

@hntrl Hunter Lovell (hntrl) commented Feb 10, 2026

Summary

  • Add new @langchain/core/utils/ssrf module with URL validation utilities to protect against SSRF attacks (private IPs, cloud metadata endpoints, localhost)
  • Harden RecursiveUrlLoader in @langchain/community by integrating shared SSRF utilities — replaces vulnerable startsWith URL matching with origin-based comparison and adds validateSafeUrl before all fetch operations
  • Bump @langchain/core peer dependency in @langchain/community to >=1.1.21

New exports

  • @langchain/core/utils/ssrf:
    • validateSafeUrl(url, options?) — async URL validation with DNS resolution (throws on unsafe URLs)
    • isSafeUrl(url, options?) — non-throwing boolean wrapper
    • isPrivateIp(ip) — checks RFC 1918, loopback, link-local ranges
    • isCloudMetadata(hostname, ip?) — detects cloud metadata endpoints (AWS, GCP, Azure)
    • isLocalhost(hostname, ip?) — detects localhost variations
    • isSameOrigin(url1, url2) — origin-based URL comparison

Port the Python SSRF protection module to TypeScript. Includes:
- IP validation against private ranges (RFC 1918, loopback, link-local)
- Cloud metadata endpoint detection (AWS, GCP, Azure)
- Localhost detection (including 127.x range)
- Safe URL validation with DNS resolution
- Comprehensive test suite (42 tests)

Functions exported:
- isPrivateIp: Check if IP is in private ranges
- isCloudMetadata: Detect cloud metadata endpoints
- isLocalhost: Detect localhost variations
- validateSafeUrl: Validate URLs are safe to connect to
- isSafeUrl: Non-throwing URL safety check

Cloud metadata is always blocked, even with allowPrivate flag.
Uses Node.js dns/promises for DNS resolution.
Replace unsafe string-based URL origin checking with semantic URL comparison
to prevent SSRF bypasses via crafted subdomains.

The preventOutside option now correctly uses URL.origin comparison instead of
String.startsWith(), which prevents bypasses like 'https://example.com.attacker.com'.

- Use semantic URL parsing for origin comparison in RecursiveUrlLoader.getChildLinks
- Add private isSameOrigin() helper method for safe URL origin validation
- Add comprehensive unit tests for SSRF vulnerability detection
- Fixes SSRF advisory: startsWith() check vulnerable to subdomain-based bypasses

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Move src/_security/ssrf_protection.ts to src/utils/ssrf_protection.ts
- Move src/_security/tests/ssrf_protection.test.ts to src/utils/tests/ssrf_protection.test.ts
- Remove _security/ directory entirely
- Add ./utils/ssrf_protection export to package.json
- Update tsdown.config.ts to include ssrf_protection in build
- Auto-generated import_map.ts update includes ssrf_protection export

All 42 tests pass. Lint and build successful.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Rename src/utils/ssrf_protection.ts to src/utils/ssrf.ts
- Rename test file src/utils/tests/ssrf_protection.test.ts to src/utils/tests/ssrf.test.ts
- Update import in test file to reference ../ssrf.js
- Update package.json export key from ./utils/ssrf_protection to ./utils/ssrf
- Update input path in package.json from ./src/utils/ssrf_protection.ts to ./src/utils/ssrf.ts
- Update tsdown.config.ts entry from ./src/utils/ssrf_protection.ts to ./src/utils/ssrf.ts
- Update import_map.ts export from utils__ssrf_protection to utils__ssrf

All tests pass (42 tests), lint passes, and build completes successfully.
…Loader

- Add isSameOrigin export to @langchain/core/utils/ssrf.ts for checking URL origin equality
- Add comprehensive tests for isSameOrigin covering same origin, different schemes/hosts/ports, invalid URLs, and subdomains
- Update RecursiveUrlLoader to import and use shared isSameOrigin and validateSafeUrl utilities
- Remove private isSameOrigin method from RecursiveUrlLoader (now uses shared implementation)
- Add SSRF validation before all fetch calls in getUrlAsDoc and getChildUrlsRecursive with allowHttp: true flag
- All tests pass, lint passes, builds succeed for both @langchain/core and @langchain/community
…1.1.21

RecursiveUrlLoader now imports from @langchain/core/utils/ssrf, which is available in v1.1.20+. Update the peer dependency lower bound to ensure users cannot install an incompatible version of @langchain/core.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- feat(core): Add SSRF protection module
- fix(community): Harden RecursiveUrlLoader against SSRF attacks
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Feb 10, 2026

🦋 Changeset detected

Latest commit: 812884e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@langchain/core Minor
@langchain/community Major
langchain Major
@langchain/anthropic Major
@langchain/google-cloud-sql-pg Patch
@langchain/google-common Major
@langchain/google-genai Major
@langchain/google-webauth Major
@langchain/model-profiles Patch
@langchain/standard-tests Patch
@langchain/google-gauth Major
@langchain/google-vertexai-web Major
@langchain/google-vertexai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added community Issues related to `@langchain/community` pkg:@langchain/community labels Feb 10, 2026
- Replace dns/promises and net imports with pure JS implementations
- Add IPv4/IPv6 validation using regex and string parsing (no Node.js deps)
- Remove net.Socket usage from expandIpv6
- Use dynamic import for DNS resolution with graceful fallback for non-Node environments
- All 50 SSRF tests pass
- Module now works in browsers, Cloudflare Workers, Deno, Vercel Edge, etc.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… resolution

- Removed resolveDns function and its Node.js-only dynamic import
- Changed validateSafeUrl from async to sync (Promise<string> -> string)
- Changed isSafeUrl from async to sync (Promise<boolean> -> boolean)
- Updated all test cases to remove 'await' keywords
- Updated RecursiveUrlLoader to remove 'await' before validateSafeUrl call
- Functions now perform only static hostname/IP validation without DNS lookups
- Still blocks private IPs, cloud metadata endpoints, and localhost as before
- Works across all environments (Node.js, browsers, workers) without DNS

Tests: 50 passing
Build: ✓ @langchain/core ✓ @langchain/community
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this important security issue! This is an excellent implementation of SSRF protection that addresses critical vulnerabilities in the RecursiveUrlLoader. The new utilities module is comprehensive and well-tested. LGTM 👍


const prefixLen = parseInt(prefixStr, 10);
if (isNaN(prefixLen)) {
return null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding validation for the prefixLen parameter to ensure it's non-negative:

Suggested change
return null;
const prefixLen = parseInt(prefixStr, 10);
if (isNaN(prefixLen) || prefixLen < 0) {
return null;
}

private async getUrlAsDoc(url: string): Promise<Document | null> {
let res;
try {
validateSafeUrl(url, { allowHttp: true });
Copy link
Copy Markdown
Member

@christian-bromann Christian Bromann (christian-bromann) Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validateSafeUrl call is synchronous but placed before an async operation. Consider awaiting it for consistency with the other call on line 174:

Suggested change
validateSafeUrl(url, { allowHttp: true });
await validateSafeUrl(url, { allowHttp: true });

@@ -0,0 +1,83 @@
import { test, describe, expect } from "@jest/globals";
import { RecursiveUrlLoader } from "../web/recursive_url.js";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an integration test that actually instantiates the loader and verifies the SSRF protection works end-to-end, rather than just testing the logic in isolation.

@github-actions github-actions bot added the ready label Feb 11, 2026
@hntrl Hunter Lovell (hntrl) merged commit d5e3db0 into main Feb 11, 2026
38 of 39 checks passed
@hntrl Hunter Lovell (hntrl) deleted the hunter/ssrf branch February 11, 2026 00:23
Mia (miadisabelle) pushed a commit to avadisabelle/ava-langchainjs that referenced this pull request Feb 18, 2026
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Nick Winder (nickwinder) pushed a commit to nickwinder/langchainjs that referenced this pull request Mar 24, 2026
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Issues related to `@langchain/community` pkg:@langchain/community ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants