Skip to content

Last response on a --max-redirect limit is not written to WARC #487

@JustAnotherArchivist

Description

@JustAnotherArchivist

When using --max-redirect, wpull permits at most that number of redirects before erroring out. However, the redirect that reaches the limit and therefore triggers the error is never written to the WARC output. For common redirect loops, that is not a major issue since there will usually be multiple identical redirect responses. But that's not always the case. It also means that it's impossible to only capture a redirect with wpull without ever following it: --max-redirect 0 will correctly raise an error on the first 3xx response, but the WARC will only contain the request. (This, for example, means it's not possible to work around #425 by first fetching the redirects and then running the redirect targets in a separate process.) See also #390 for a similar bug where a syntactically fine but semantically problematic redirect response isn't written to WARC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions