Remove DOTALL flag from parse_wp_config() regexes#64
Remove DOTALL flag from parse_wp_config() regexes#64swissspidy merged 10 commits intowp-cli:mainfrom
Conversation
|
Hello! 👋 Thanks for opening this pull request! Please check out our contributing guidelines. We appreciate you taking the initiative to contribute to this project. Contributing isn't limited to just code. We encourage you to contribute in the way that best fits your abilities, by writing tutorials, giving a demo at your local meetup, helping other users with their support questions, or revising our documentation. Here are some useful Composer commands to get you started:
To run a single Behat test, you can use the following command: # Run all tests in a single file
composer behat features/some-feature.feature
# Run only a specific scenario (where 123 is the line number of the "Scenario:" title)
composer behat features/some-feature.feature:123You can find a list of all available Behat steps in our handbook. |
There was a problem hiding this comment.
Code Review
This pull request modifies the regex patterns in WPConfigTransformer.php by removing the global /s (DOTALL) flag to prevent over-matching on complex expressions like string concatenations. It also adds a new test suite and fixture to verify the parsing of variables and constants in these cases. A review comment points out that this change causes a regression for string values containing literal newlines and suggests using inline (?s:...) flags within the quoted string segments to maintain support for multiline values without the global over-matching issue.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Thanks for the PR! I'll take a closer look after the weekend. |
There was a problem hiding this comment.
Pull request overview
This PR adjusts WPConfigTransformer::parse_wp_config() regex parsing to avoid consuming subsequent assignments when a value contains non-literal expressions (notably concatenation), and adds regression coverage for that scenario (Fixes #63).
Changes:
- Updated constant/variable matching regexes to remove the DOTALL (
s) flag and replaced the quoted-string alternations with unrolled-loop patterns. - Added a new wp-config fixture exercising concatenation and multiline string literals.
- Added a PHPUnit test validating
exists()/get_value()behavior for variables/constants following concatenation and multiline strings, and updated PHPCS exclusions for the new fixture.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/WPConfigTransformer.php |
Updates regex parsing to avoid cross-statement consumption and improve quoted-string matching. |
tests/fixtures/wp-config-concat.php |
New fixture with concatenation and multiline string literal cases. |
tests/ConcatenationTest.php |
New regression tests ensuring later variables/constants are still discoverable. |
phpcs.xml.dist |
Excludes the new fixture from PHPCS scanning. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Use [\s\S]*? instead of .*? in the regex fallback so multiline raw expressions (array defines, multi-line concatenation, ternary) are still matched after removing the /s flag. The unrolled-loop quoted patterns prevent the original cross-statement matching bug. Move ConcatenationTest temp file to tests/ root to match the convention used by all other tests and to be covered by .gitignore. Add test cases for multiline concatenation, multiline array define, and variable resolution after a multiline raw define.
Replace \. with [\s\S] in the unrolled-loop escape subpattern so a literal backslash immediately followed by a newline inside a quoted string is matched by the quoted-string branch rather than falling through to the general fallback. Add test cases for backslash-newline in a quoted string and for variable resolution after such a string.
Use a data provider to verify get_value returns the correct captured value for concatenation expressions, multiline strings, multiline raw expressions, and backslash-newline strings. Add an update test that modifies a multiline entry and verifies the new value is read back correctly and subsequent entries survive.
get_value() does not normalize line endings (unlike exists()), so on Windows the returned value contains \r\n from the fixture file. Normalize before comparing to make assertions platform-independent.
|
@swissspidy, addressed Copilot's feedback with additional test cases included. I am by no means a regex expert, so my confidence is mostly in trying to ID edge test cases— if there's anything else you might think to add as coverage let me know. |
wp-cli/wp-config-transformer#64 is merged but not yet released. Using the VCS source + dev-main so CI can run against the fixed code. Will revert to a version constraint once a release is cut.
Summary
Fixes #63 — concatenation expressions like
$x = 'foo' . $bar;causedparse_wp_config()to match across statement boundaries, makingexists(),get_value(), and other methods fail to find entries that followed.Root cause
The old quoted-string patterns (
'.*?[^\\]'/".*?[^\\]") with the/s(DOTALL) flag could match from the opening quote of one statement all the way into a subsequent statement when the value wasn't a simple quoted string (e.g. contained concatenation operators).How the regex works now
The value-matching portion of each regex uses three alternation branches, tried left to right:
''or""(literal empty values)'[^'\\]*(?:\\[\s\S][^'\\]*)*'(and the double-quote equivalent)[^'\\]*matches any run of characters that aren't a quote or backslash (including newlines — so multiline string values work)(?:\\[\s\S][^'\\]*)*matches escape sequences: a backslash followed by any character (including newline), then another run of non-quote/non-backslash characters. Repeats zero or more times.[\s\S]*?matches everything else (function calls, arrays, ternaries, concatenation expressions) lazily up to the closing)or;. Uses[\s\S]instead of.so multiline raw expressions (e.g. multi-line array defines) still match without the/sflag.The
/s(DOTALL) flag is removed entirely. Newline matching is handled explicitly:[\s\S]where needed, and[^'\\]/[^"\\]character classes which inherently match newlines.What's tested
The
ConcatenationTestcovers:get_value()correctness for all edge cases (captured value groups)update()on a multiline entry with post-update readback