Skip to content

[pull] master from JamesHeinrich:master#76

Open
pull[bot] wants to merge 167 commits intodevilkun:masterfrom
JamesHeinrich:master
Open

[pull] master from JamesHeinrich:master#76
pull[bot] wants to merge 167 commits intodevilkun:masterfrom
JamesHeinrich:master

Conversation

@pull
Copy link

@pull pull bot commented Dec 22, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Dec 22, 2021
bvibber and others added 27 commits December 22, 2021 15:17
In 9ce8040 I introduced a divide-by-zero bug into the aspect
ratio handling for MPEG-2 videos. In production at Wikipedia
we've seen a few attempted uploads for files that trigger a
divide by zero on the width, which either means corrupt files
or something's wrong elsewhere.

Unfortunately our logging doesn't include the files being
uploaded, so I don't have a test case to analyze. But this
workaround will at least avoid a fatal error that can't be
caught.

On our bug tracker: https://phabricator.wikimedia.org/T289189
Work around divide by zero bug in MPEG-2 aspect ratio
The action used to install Composer packages and handle the caching has released a new major (and some follow-up patch releases), which means, the action reference needs to be updated to benefit from it.

Refs:
* https://github.com/ramsey/composer-install/releases/tag/2.0.0
* https://github.com/ramsey/composer-install/releases/tag/2.0.1
* https://github.com/ramsey/composer-install/releases/tag/2.0.2
…action

GH Actions: version update for `ramsey/composer-install`
[ASF] Improve support of Header Extension Object data
…place) of type array|string is deprecated
Fix a few PHP 8.1-related deprecations & improve support of animated gif files
Typo fixed in getid3/getid3.php
…, and add to the defences of XML entity injection.

This PR adds LIBXML_WARNING, LIBXML_NONET and LIBXML_COMPACT (if supported), in addition to LIBXML_NOENT, to simplexml_load_string(), which is currently only used for parsing RIFF iXML (http://www.gallery.co.uk/ixml/)

* LIBXML_NONET, to deny access to entites included remotely. This is the most interesting one, as there is other code nearby which is supposed to address the same issue.
* LIBXML_NOWARNING, to suppress recoverable parsing warnings which we can't do anything about (and may even be deliberate)
* If your version of libxml supports it, LIBXML_COMPACT, which is described at https://www.php.net/manual/en/libxml.constants.php/ as "Activate small nodes allocation optimization. This may speed up your     application without needing to change the code." The gotcha is that the DOM is readonly, but we're not using this code to manipulate tags so that should be fine.

I'll explain the rationale for each.

Invalid XML / LIBXML_NOWARNING
==============================

I encountered a .wav file with an iXML studio mastering tag in my collection. Here's an excerpt from the file:

    <?xml version="1.0" encoding="UTF-8"?>
    ...
    <PROJECT>Optiv & CZA - Invisible Things-13 Mixdown-32 NEW 1--22</PROJECT>
    ...
    <BWF_CODING_HISTORY>A=PCM,F=44100,W=16,T=SOUND FORGE Pro 12.1,
    </BWF_CODING_HISTORY>

As you can see, the `<PROJECT>` tag is technically invalid because it has a nameless entity (`&`). The exact warning is
`PHP Warning:  simplexml_load_string(): Entity: line 3: parser error : xmlParseEntityRef: no name`.

This is clearly both recoverable and harmless, and may even be intentional given that Sound Forge Pro 12 is not old (2018).

Suppressing parser warnings seems to fit with the other warning suppressions nearby in the code.

XML Inclusion / LIBXML_NONET
============================

This one's more interesting. In commit `afbdaa04` (2014) additional code was added by the wordpress team with the commit "improved XXE fix". The change adds option LIBXML_NOENT, and references the article http://websec.io/2012/08/27/Preventing-XEE-in-PHP.html.

The problem is that I think LIBXML_NOENT was a typo and should have been LIBXML_NONET. The article does not mention NOENT at all.

Ironically, the NOENT option does the _exact opposite_ of what it sounds like it does - it *enables* entity parsing. Refs:
* https://www.php.net/manual/en/libxml.constants.php - comment "The name of the constant LIBXML_NOENT is very misleading. Adding this flag actually causes the parser to load and insert the external entities. Omitting it leaves the tags untouched, which is probably what you want."
* https://stackoverflow.com/q/38807506/367456 - "What does LIBXML_NOENT do (and why isn't it called LIBXML_ENT)?"

I suspect this was a mistake by the original author. Nevertheless, it's fairly harmless to have entities enabled in the XML as long as it's not possible to do remote inclusions, and that's already disabled with `libxml_disable_entity_loader()`.

Adding LIBXML_NONET (as referenced in the article to improve XXE defences) prevents libxml from using the network.

I did not remove LIBXML_NOENT as theoretically you can define your own entites inline at the top of an XML doc, even though I very much doubt it would work properly if you did, given the first bug addressed in this PR.

Performance / LIBXML_COMPACT
============================

If your version of libxml supports it this flag "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." according to https://www.php.net/manual/en/libxml.constants.php . The gotcha is that the returned DOM object is readonly, but we're not using this code to manipulate a DOM so that should be fine.
Define options to suppress warnings generated by invalid iXML content, and add to the defences of XML entity injection.
The type has been identified. ($BitrateCache[$FrameLength]])
JamesHeinrich and others added 30 commits December 16, 2025 11:38
Add conditional PHP_INT_MIN define for older PHP versions (<v5.6?)
sync PHP version requirements readme.txt -> readme.md
PHP_INT_MIN is not introduced until PHP v7.0.0
Use tab indentation instead of mixed space-tab
Really should be <= PHP_INT_MAX and >= PHP_INT_MIN, but trying "(int)9.2233720368548E+18" results in PHP warning "The float 9.2233720368548E+18 is not representable as an int, cast occurred", a slightly more restrictive range avoids the edge-case warnings.

#477
uncompressed_audio_bytes = 0 would fatal, but is valid in FLAC files
where the total_samples field in the STREAMINFO is set to 0.
This is common for live recordings
Because the id3v1 parser does a negative 128 byte fseek, it can only
parse files that are longer than 128 bytes. Some midi files are
shorter than that and could not be parsed.
When there are no tempo events, id3 would report a playtime of 0
seconds. Instead, we can assume the default of 120BPM and use the
ticks to calculate the duration.
ID3v1 should not parse files shorter than 128 bytes
MIDI: Fix duration for files without tempo events
MIDI: Check for minimum filessize of a valid MIDI header
Allow metadata extraction from streamable FLAC files
Streamable files generally do not provide Duration via the header.
In order to support figuring out the duration of these files, we can
scan clusters from the start and the end of the file and give an
estimate that way.

Preserve parse_whole_file and hide_clusters functionality.
Keep track of why we are scanning the file, with an internal scan mode
variable. This makes it slightly more readable.
This would throw PHP Warning: Undefined array key "TrackUID" when
feeding the analyzer a partial webm file.
Stop calling imagedestroy in PHP8+
Matroska: scan for duration when not provided by metadata
#486
Avoid false detection of MP3 data in WAV file
`$computed_time` was reset to zeros after GPSDateStamp values were written,
causing the `$info['jpg']['exif']['GPS']['computed']['timestamp']` to use only the time portion (month/day/year = 0).
fix: GPS computed timestamp ignoring date due to array reinitialization
remove LIBXML_NOENT flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.