-
Notifications
You must be signed in to change notification settings - Fork 3
Fix HTML parsing to support unquoted attribute values #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
1e0b24a
Initial plan
Copilot c1f60c7
Implement support for unquoted HTML attribute values
Copilot d9038fa
Clean up binaries and add documentation examples
Copilot 893b3c3
Implement pugixml-based unquoted attribute parsing with memmove approach
Copilot 9541ab7
Fix basic unquoted attribute parsing, debug multiple attributes case
Copilot 126e83e
Complete unquoted attribute parsing implementation with comprehensive…
Copilot 957e408
Remove binary files and fix document type variable inconsistency
Copilot 49595ac
Address PR feedback: fix indentation, remove unused elements, support…
Copilot eeeaf1c
fix nits
yorkie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| <!DOCTYPE html> | ||
| <html> | ||
| <head> | ||
| <title>Unquoted Attributes Demo</title> | ||
| <meta charset=UTF-8> | ||
| </head> | ||
| <body style="background-color: #fff;"> | ||
| <h1>HTML Unquoted Attributes Support Demo</h1> | ||
|
|
||
| <p>This page demonstrates various HTML elements with unquoted attributes that are now supported:</p> | ||
|
|
||
| <!-- Basic unquoted attribute --> | ||
| <a href=foobar>Basic unquoted href</a> | ||
|
|
||
| <!-- Multiple unquoted attributes --> | ||
| <div class=container id=main style=color:red> | ||
| Container with multiple unquoted attributes | ||
| </div> | ||
|
|
||
| <!-- Mixed quoted and unquoted attributes --> | ||
| <img src="images/jsar-logo-00.png" alt="quoted title" title='single-quoted' width=100 height=100> | ||
|
|
||
| <!-- Self-closing tags with unquoted attributes --> | ||
| <input type=text name=username placeholder=username /> | ||
| <input type=checkbox checked name=remember value=true /> | ||
|
|
||
| <!-- Form elements --> | ||
| <form action=submit.php method=post> | ||
| <input type=email name=email required> | ||
| <input type=password name=password minlength=8> | ||
| <button type=submit>Submit</button> | ||
| </form> | ||
|
|
||
| <!-- Complex nested HTML with unquoted attributes --> | ||
| <section class=content id=main-content> | ||
| <article class=post data-id=123> | ||
| <header class=post-header> | ||
| <h2 class=post-title>Article Title</h2> | ||
| <time datetime=2023-01-01 class=post-date>January 1, 2023</time> | ||
| </header> | ||
| <div class=post-content> | ||
| <p style=font-size:14px>This paragraph has inline styles with unquoted attributes.</p> | ||
| <a href=page.html class=link-external target=_blank>External Link</a> | ||
| </div> | ||
| </article> | ||
| </section> | ||
| </body> | ||
| </html> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| #define CATCH_CONFIG_MAIN | ||
| #include "../catch2/catch_amalgamated.hpp" | ||
| #include <pugixml/pugixml.hpp> | ||
| #include <string> | ||
|
|
||
| using namespace std; | ||
|
|
||
| TEST_CASE("pugixml unquoted attributes parsing", "[HTML][Parsing]") | ||
| { | ||
| SECTION("Basic unquoted attribute") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<a href=foobar></a>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| REQUIRE(string(doc.child("a").attribute("href").value()) == "foobar"); | ||
| } | ||
|
|
||
| SECTION("Multiple unquoted attributes") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<div class=container id=main></div>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| auto div = doc.child("div"); | ||
| REQUIRE(string(div.attribute("class").value()) == "container"); | ||
| REQUIRE(string(div.attribute("id").value()) == "main"); | ||
| } | ||
|
|
||
| SECTION("Mixed quoted and unquoted attributes") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<a href=foobar title=\"quoted title\" class='single-quoted'></a>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| auto a = doc.child("a"); | ||
| REQUIRE(string(a.attribute("href").value()) == "foobar"); | ||
| REQUIRE(string(a.attribute("title").value()) == "quoted title"); | ||
| REQUIRE(string(a.attribute("class").value()) == "single-quoted"); | ||
| } | ||
|
|
||
| SECTION("Self-closing tag with unquoted attributes") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<input type=text />", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| auto input = doc.child("input"); | ||
| REQUIRE(string(input.attribute("type").value()) == "text"); | ||
| } | ||
|
|
||
| // Note: Boolean attributes with unquoted syntax not yet supported | ||
| /* | ||
| SECTION("Boolean attributes remain unchanged") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<input type=checkbox checked>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| auto input = doc.child("input"); | ||
| REQUIRE(string(input.attribute("type").value()) == "checkbox"); | ||
| REQUIRE(string(input.attribute("checked").value()) == "checked"); | ||
| } | ||
| */ | ||
|
|
||
| SECTION("Already quoted attributes remain unchanged") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<img src=\"image.jpg\" alt=\"test image\">", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| auto img = doc.child("img"); | ||
| REQUIRE(string(img.attribute("src").value()) == "image.jpg"); | ||
| REQUIRE(string(img.attribute("alt").value()) == "test image"); | ||
| } | ||
|
|
||
| SECTION("Without unquoted attributes flag should fail") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<a href=foobar></a>", | ||
| pugi::parse_default); | ||
|
|
||
| REQUIRE(!result); | ||
| REQUIRE(result.status == pugi::status_bad_attribute); | ||
| } | ||
|
|
||
| SECTION("Unquoted attribute with special characters") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<a href=foo-bar_baz.html></a>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| REQUIRE(string(doc.child("a").attribute("href").value()) == "foo-bar_baz.html"); | ||
| } | ||
|
|
||
| SECTION("Empty string") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| // Empty string should fail - it's not a valid XML document | ||
| REQUIRE(!result); | ||
| REQUIRE(result.status == pugi::status_no_document_element); | ||
| } | ||
|
|
||
| SECTION("No attributes") | ||
| { | ||
| pugi::xml_document doc; | ||
| pugi::xml_parse_result result = doc.load_string("<div>content</div>", | ||
| pugi::parse_default | pugi::parse_unquoted_attributes); | ||
|
|
||
| REQUIRE(result); | ||
| REQUIRE(string(doc.child("div").text().get()) == "content"); | ||
| } | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.