-
Notifications
You must be signed in to change notification settings - Fork 535
Closed
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Milestone
Description
When working with large proxy pools (e.g., Apify RESIDENTIAL), I observe significant RAM usage growth. Memory consumption increases by more than a gigabyte within a few minutes during active scraping.
My version:
The issue appears to be related to creating an HTTP session for each proxy https://github.com/apify/crawlee-python/blob/master/src/crawlee/http_clients/_httpx.py#L132 combined with the high default maximum SessionPool size of 1000 sessions. This results in the crawler creating a new HTTP session for almost every new request during its initial run.
The absence of cleanup logic for created HTTP sessions will likely worsen the situation when the proxy pool contains a large number of "bad" proxies.
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.