Skip to content

Conversation

@Ray0907
Copy link

@Ray0907 Ray0907 commented Nov 22, 2025

  • Replaced bit-by-bit loops in HScanForEdge with word-level processing using hardware population count (POPCNT).
  • Added helper to handle 32-bit word alignment and boundary masking efficiently.
  • Optimized VScanForEdge by hoisting bit position calculations out of the inner loop.
  • Added cross-platform POPCOUNT macros supporting GCC/Clang (__builtin_popcount) and MSVC (__popcnt).

- Replaced bit-by-bit loops in HScanForEdge with word-level processing using hardware population count (POPCNT).
- Added  helper to handle 32-bit word alignment and boundary masking efficiently.
- Optimized VScanForEdge by hoisting bit position calculations out of the inner loop.
- Added cross-platform POPCOUNT macros supporting GCC/Clang (__builtin_popcount) and MSVC (__popcnt).
@egorpugin
Copy link
Contributor

What time improvements do you observe?
Please provide os, compiler infos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants