Skip to content

Comments

Prefer filename over filename* in multipart parser#2398

Merged
ioquatix merged 5 commits intorack:mainfrom
wtn:filename
Nov 2, 2025
Merged

Prefer filename over filename* in multipart parser#2398
ioquatix merged 5 commits intorack:mainfrom
wtn:filename

Conversation

@wtn
Copy link
Contributor

@wtn wtn commented Oct 21, 2025

Rack multipart prefers filename* over filename when both are present in a Content-Disposition header.

But filename* is for HTTP headers only, so the current behavior (proposed in #1762, merged in #1789) should be reversed. Rack should only look at filename* if filename is not present. See the note in RFC 7578 §4.2 for background.

For a future major release, suggest to return 400 when a request has filename* in a multipart form, because practically all such requests would have malign intent. At minimum, filename* should be stripped out when filename is present, to prevent security boundary mismatches.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your report. I would go so far as to say we shouldn't consider filename* at all during multipart parsing.

When you mention "filename* should be stripped out when filename is present", can you be specific about what it should be stripped out from? I think the only place it would be would be present is in the :head entries of the multipart parts, and those are almost never accessed IME.

Doing some research, the MDN documentation was not specific about avoiding filename* for headers in multipart bodies at the time that #1762 was submitted (http://web.archive.org/web/20210730214530/https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition). The MDN documentation has been updated since to make it clear filename* should not be used during multipart parsing. With the current MDN documentation, the change would not have been merged originally.

@jeremyevans jeremyevans requested a review from ioquatix October 21, 2025 23:51
@wtn
Copy link
Contributor Author

wtn commented Oct 23, 2025

When you mention "filename* should be stripped out when filename is present", can you be specific about what it should be stripped out from? I think the only place it would be would be present is in the :head entries of the multipart parts, and those are almost never accessed IME.

You are correct, my suggestion can be disregarded.

@ioquatix
Copy link
Member

ioquatix commented Oct 23, 2025

I don't have a strong opinion about this but I do wonder if this is the right thing to do. It seems inconsistent to me to parse it differently depending on context. I agree generally to follow the standards. So before merging, I'd like to hear from @chiwenchen (if possible) regarding whether this was actually used in practice. Because if we remove this, I don't see any alternative to encoding file names using UTF-8 etc. (unless it became acceptable to put encoded names into filename?).

I'm slightly worried that this will silently break things. It may be better to issue a warning instead so we can confirm it's not in use?

@ioquatix
Copy link
Member

Okay, I did some more digging:

From https://www.rfc-editor.org/rfc/rfc8187.html

By default, header field values in Hypertext Transfer Protocol (HTTP)
messages cannot easily carry characters outside the US-ASCII coded
character set. RFC 2231 defines an encoding mechanism for use in
parameters inside Multipurpose Internet Mail Extensions (MIME) header
field values. This document specifies an encoding suitable for use
in HTTP header fields that is compatible with a simplified profile of
the encoding defined in RFC 2231.

So field*=value variants exist to work around the ASCII-only limitation of HTTP headers. IOW, filename* is a standards-compliant escape hatch for non-ASCII filenames in HTTP headers,
invented because HTTP header fields themselves must be pure ASCII.

As this does not apply to mime bodies, such a field is not necessary, so we can use filename with UTF-8 directly, and this is called out in https://www.rfc-editor.org/rfc/rfc7578.txt

In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.

NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.

Some commonly deployed systems use multipart/form-data with file
names directly encoded including octets outside the US-ASCII range.
The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.

So percent-encoded filename is acceptable, and "some commonly deployed systems" use UTF-8 directly.

Therefore, I'm okay to merge this PR, but I don't think it's a good idea to backport it. We also need a changelog entry (probably marked as breaking).

Copy link
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update changelog thanks.

wtn and others added 4 commits November 2, 2025 13:59
…eaders.

The `filename*` parameter is for HTTP headers only and should not be
used in multipart/form-data. When both `filename` and `filename*` are
present in a multipart form Content-Disposition header, prefer the standard
`filename` parameter.
@ioquatix ioquatix merged commit a36655e into rack:main Nov 2, 2025
17 checks passed
@wtn wtn deleted the filename branch November 2, 2025 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants