Implement optimized movemasks for NEON #1236

onalante-ebay · 2025-12-26T17:47:47Z

While the additional instructions required to obtain one bit per lane
make this more expensive than directly supporting variable-sized bit
groups (as done in Zstandard¹), the result is still an improvement
over the current lane-by-lane algorithm.

To reduce duplication, XSIMD_LITTLE_ENDIAN is moved from
math/xsimd_rem_pio2.hpp to config/xsimd_config.hpp, and will now be
available outside the defining header.

See [lazy] Optimize ZSTD_row_getMatchMask for levels 8-10 for ARM facebook/zstd#3139, namely ZSTD_row_matchMaskGroupWidth. ↩

While the scalar post-processing required to obtain one bit per lane makes this more expensive than directly supporting variable-sized bit groups (as done in Zstandard[^1]), the result is still an improvement over the current lane-by-lane algorithm. [^1]: See facebook/zstd#3139, namely `ZSTD_row_matchMaskGroupWidth`.

serge-sans-paille · 2025-12-26T18:54:48Z

I've suggested an improvement of the 64 bit version there: https://godbolt.org/z/b7henc933

onalante-ebay · 2025-12-26T19:07:56Z

Applied, thank you for the suggestion. I will fix the GCC build in a moment.

include/xsimd/arch/xsimd_neon.hpp

serge-sans-paille · 2025-12-27T15:32:50Z

FYI, I'm working on the aarch64 version in #1237

As a complement to #1236

serge-sans-paille · 2025-12-28T13:13:53Z

LGTM once CI is good. I'll squash everything in a single commit and merge. Thanks for this!

onalante-ebay · 2025-12-29T17:23:02Z

Great, thank you for your assistance.

onalante-ebay force-pushed the neon_bitmask branch from 10bab43 to c75a0ea Compare December 26, 2025 17:52

onalante-ebay force-pushed the neon_bitmask branch from c75a0ea to ffaa19d Compare December 26, 2025 17:54

Improve 64-bit bitmask code

7aa9df4

Fix build when if constexpr is unavailable

d205b15

onalante-ebay force-pushed the neon_bitmask branch from 371ab6b to d205b15 Compare December 26, 2025 19:20

serge-sans-paille reviewed Dec 27, 2025

View reviewed changes

include/xsimd/arch/xsimd_neon.hpp Outdated Show resolved Hide resolved

Use SSE2NEON movemask kernels

ebf7519

onalante-ebay force-pushed the neon_bitmask branch from eaa1835 to ebf7519 Compare December 27, 2025 20:32

serge-sans-paille added a commit that referenced this pull request Dec 28, 2025

More efficient batch_bool::mask() for aarch64

5fac2ad

As a complement to #1236

serge-sans-paille added a commit that referenced this pull request Dec 28, 2025

More efficient batch_bool::mask() for aarch64

fa3d5f9

As a complement to #1236

serge-sans-paille merged commit 6f7ee16 into xtensor-stack:master Dec 29, 2025
58 of 59 checks passed

onalante-ebay deleted the neon_bitmask branch December 29, 2025 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement optimized movemasks for NEON #1236

Implement optimized movemasks for NEON #1236

onalante-ebay commented Dec 26, 2025 •

edited

Loading

Uh oh!

serge-sans-paille commented Dec 26, 2025

Uh oh!

onalante-ebay commented Dec 26, 2025

Uh oh!

Uh oh!

serge-sans-paille commented Dec 27, 2025

Uh oh!

serge-sans-paille commented Dec 28, 2025

Uh oh!

onalante-ebay commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement optimized movemasks for NEON #1236

Implement optimized movemasks for NEON #1236

Conversation

onalante-ebay commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

serge-sans-paille commented Dec 26, 2025

Uh oh!

onalante-ebay commented Dec 26, 2025

Uh oh!

Uh oh!

serge-sans-paille commented Dec 27, 2025

Uh oh!

serge-sans-paille commented Dec 28, 2025

Uh oh!

onalante-ebay commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

onalante-ebay commented Dec 26, 2025 •

edited

Loading