Skip to content

max_session_rotations not being respected in AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser #1484

@ericvg97

Description

@ericvg97

Hi, its me again, sorry for the amount of issues I am opening:) I think in this case it is a bug. I am using this

concurrency_settings  =  ConcurrencySettings(
    min_concurrency=1,
    max_concurrency=1,
    desired_concurrency=1,
    max_tasks_per_minute=200)
    
crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
    max_requests_per_crawl=10000,
    playwright_crawler_specific_kwargs={
        "browser_type": "chromium",
        "headless": True,
    },
    max_session_rotations=10,
    retry_on_blocked=True,
    concurrency_settings=concurrency_settings,
    use_session_pool=True,
    max_request_retries=5,
    keep_alive=True,
)

and if I crawl this url "http://httpbingo.org/status/429" it keeps rotating the session ("Assuming the session is blocked based on HTTP status code 429", rotating session and retrying...") until infinity

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions