Skip to content

Conversation

@onalante-ebay
Copy link
Contributor

@onalante-ebay onalante-ebay commented Dec 26, 2025

While the additional instructions required to obtain one bit per lane
make this more expensive than directly supporting variable-sized bit
groups (as done in Zstandard1), the result is still an improvement
over the current lane-by-lane algorithm.

To reduce duplication, XSIMD_LITTLE_ENDIAN is moved from
math/xsimd_rem_pio2.hpp to config/xsimd_config.hpp, and will now be
available outside the defining header.

Footnotes

  1. See [lazy] Optimize ZSTD_row_getMatchMask for levels 8-10 for ARM facebook/zstd#3139, namely ZSTD_row_matchMaskGroupWidth.

While the scalar post-processing required to obtain one bit per lane
makes this more expensive than directly supporting variable-sized bit
groups (as done in Zstandard[^1]), the result is still an improvement
over the current lane-by-lane algorithm.

[^1]: See facebook/zstd#3139, namely `ZSTD_row_matchMaskGroupWidth`.
@serge-sans-paille
Copy link
Contributor

I've suggested an improvement of the 64 bit version there: https://godbolt.org/z/b7henc933

@onalante-ebay
Copy link
Contributor Author

Applied, thank you for the suggestion. I will fix the GCC build in a moment.

@serge-sans-paille
Copy link
Contributor

FYI, I'm working on the aarch64 version in #1237

serge-sans-paille added a commit that referenced this pull request Dec 28, 2025
serge-sans-paille added a commit that referenced this pull request Dec 28, 2025
@serge-sans-paille
Copy link
Contributor

LGTM once CI is good. I'll squash everything in a single commit and merge. Thanks for this!

@onalante-ebay
Copy link
Contributor Author

Great, thank you for your assistance.

@serge-sans-paille serge-sans-paille merged commit 6f7ee16 into xtensor-stack:master Dec 29, 2025
58 of 59 checks passed
@onalante-ebay onalante-ebay deleted the neon_bitmask branch December 29, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants