Skip to content

Conversation

jayaddison
Copy link
Contributor

This is a small consistency fixup relating to the way that attribute names are retrieved; it also makes some follow-up refactoring work a little cleaner.

Parsing continues fine if we consume a single character at a time during attribute name tokenization, and this doesn't appear to affect performance positively or negatively.

@gsnedders
Copy link
Member

it also makes some follow-up refactoring work a little cleaner

FWIW, I think it's pretty likely that any Cython-compiled version of html5lib, once that exists, will use charsUntil more widely than we do today.

@jayaddison
Copy link
Contributor Author

it also makes some follow-up refactoring work a little cleaner

FWIW, I think it's pretty likely that any Cython-compiled version of html5lib, once that exists, will use charsUntil more widely than we do today.

That's a good goal/consideration to keep in mind, thanks. For this instance, the suggested change is largely to help indicate that there's no accidental change-of-behaviour introduced by the refactoring in https://github.com/html5lib/html5lib-python/pull/521/files#diff-84be0df9e74521d407f26e2277a2c70be21dbe6012fea9a5786721c5027e2cfaL894-R868

It also seems consistent with the comment and logic in tagNameState (I didn't copy the comment over - but could do)

@jayaddison
Copy link
Contributor Author

Cleaning up some old / stale pull requests; please let me know if this changeset is considered worthwhile and I'll reopen if so.

@jayaddison jayaddison closed this Dec 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants