x86: Improve String.hashCode for medium/long strings #22875

BradleyWood · 2025-10-31T20:35:04Z

This commit improves String.hashCode() performance by calling vectorizedHashCodeLoopHelper(...) to
process longer string in an unrolled vectorized loop with larger vector lengths. Minor tweaks were made, including to change how multiplication vectors are loaded.

0xdaryl · 2025-10-31T23:59:24Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

+   int32_t unrollCount = 1;
+#endif
+
+   int32_t charsPerMainLoopIteration = unrollCount * vectorSizes[vl];


On x86, getMaxPreferredVectorLength() only returns one of TR::VectorLength[128|256|512], whose enum values map to integers 1,2,3. This doesn't map correctly to the zero-indexed vectorSizes array and will exceed it in the case of TR::VectorLength512.

That's a great catch, and likely hurts performance by causing extra iterations in the small loop.

0xdaryl · 2025-11-01T00:13:16Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

   deps->addPostCondition(hashXMM, TR::RealRegister::NoReg, cg);

+   // Generate Main Loop; 4x Unrolled seems to yield the best performance for large arrays
+   static char *unrollVar = feGetEnv("TR_setInlineStringHashCodeUnrollCount");


Move this under the #ifdef if it only applies to the 64-bit case.

0xdaryl · 2025-11-01T00:16:20Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

+
+   generateLabelInstruction(TR::InstOpCode::label, node, bigLoopLabel, cg);
+
+   // Secondary unrolled vectorized loop with larget vector lengths


larget == large?

BradleyWood · 2025-11-03T14:59:14Z

@0xdaryl See the force-push.

0xdaryl · 2025-11-10T16:43:24Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

+   generateRegRegInstruction(TR::InstOpCode::XORRegReg(), node, index, index, cg);
+   generateRegRegInstruction(TR::InstOpCode::XORRegReg(), node, hash, hash, cg);
+   generateRegRegInstruction(TR::InstOpCode::MOV4RegReg, node, loopLimit, length, cg);
+   generateRegImmInstruction(TR::InstOpCode::AND4RegImm4, node, loopLimit, charsPerMainLoopIteration - 1, cg);


This only works accurately if charsPerMainLoopIteration is a power of 2 (I think). Maybe add an assert in the code here to check for that. Practically speaking, I think this can only happen if someone specifies an unrollCount that is not a power of 2.

There is an assert checking for this in the vectorizedHashCodeHelper. Setting TR_setInlineStringHashCodeUnrollCount=7 fires the following assert "Unroll count must be 1/2/4".

0xdaryl · 2025-11-10T17:18:22Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

-   generateRegRegInstruction(TR::InstOpCode::CMP4RegReg, node, index, length, cg);
+   generateRegRegInstruction(TR::InstOpCode::CMP4RegReg, node, index, loopLimit, cg);
   generateLabelInstruction(TR::InstOpCode::JL4, node, loopLabel, cg);
   generateLabelInstruction(TR::InstOpCode::label, node, endLabel, deps, cg);


deps should be on the doneLabel to capture all register usage across this entire evaluator to ensure correct register assignment. Be sure all register usage is reflected (for example, you removed length from the dep conditions earlier) if it appears inside any codegen-created control flow.

0xdaryl · 2025-11-13T13:04:17Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

+   deps->addPostCondition(loopLimit, TR::RealRegister::NoReg, cg);
   deps->addPostCondition(multiplierXMM, TR::RealRegister::NoReg, cg);
   deps->addPostCondition(tmpXMM, TR::RealRegister::NoReg, cg);
   deps->addPostCondition(hashXMM, TR::RealRegister::NoReg, cg);


length and hash are missing from the deps and used in the control flow below.

See force-push

This commit improves String.hashCode() performance by calling vectorizedHashCodeLoopHelper(...) to process longer string in an unrolled vectorized loop with larger vector lengths. Minor tweaks were made, including to change how multiplication vectors are loaded. Signed-off-by: Bradley Wood <bradley.wood@ibm.com>

0xdaryl · 2025-11-13T16:22:02Z

Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21

0xdaryl · 2025-11-14T00:18:49Z

There are some failures in the PR testing that are reproducible with a grinder. I believe they might be related to this PR because similar failures were seen a couple of years back in #18371 that were "fixed" when a suspected PR was reverted. I don't believe that PR has anything to do with this PR, but it does demonstrate that JIT changes can cause failures in these tests. Please investigate.

BradleyWood · 2025-11-19T16:44:29Z

I reviewed the failures, all seem to happen the same way. In each case, a NullPointerException is preceded by a testng error message complaining that class is not found. The NPE seems to be genuine, and not an artifact of the signal handler. It also happens in hotspot if the class under test is truly missing from the classpath. The 'missing' class seems to have no relation to the underlying test. I am not sure why it would even attempt to load the class, as it is a part of a different test completely.

I have not seen any failures that manifest in a different way. I attempted to produce the issue locally, and create a reduced test case, but wasn't able to reproduce the issue. I tried increasing the register pressure, but this also did not yield any issue yet. Despite that, I think register assignment is the only plausible explanation.

I reviewed the way registers are handled in this PR. Other than a potential issue with the start/end of inner-control-flow flags on labels, nothing sticks out to me. I tried updating the ICF flags key labels, but still had a couple occurrences in Jenkins. I will try reverting the last few edits made surrounding RA, the code does look suspicious after your review comment, but there may have been a reason I decided to implement it that way. I could use another set of eyes on the RA related code, especially the part where we call out to the vectorizedHashCodeLoopHelper.

0xdaryl · 2025-11-28T19:55:35Z

Are you able to run a grinder (not locally, but using the farm machines) on the test to get an idea of what the failure rate is? I believe I restarted the test twice during the CI testing and it failed both times, but perhaps you could set up 10 grinders to see if this is an intermittent issue or not.

BradleyWood added comp:jit perf arch:x86 labels Oct 31, 2025

BradleyWood assigned 0xdaryl Oct 31, 2025

0xdaryl reviewed Nov 1, 2025

View reviewed changes

BradleyWood force-pushed the shc_hybrid branch from 2168228 to 768f1b1 Compare November 3, 2025 14:58

0xdaryl reviewed Nov 10, 2025

View reviewed changes

BradleyWood force-pushed the shc_hybrid branch from 768f1b1 to a59b029 Compare November 11, 2025 20:11

0xdaryl reviewed Nov 13, 2025

View reviewed changes

BradleyWood force-pushed the shc_hybrid branch from a59b029 to c4ce760 Compare November 13, 2025 15:41


		generateLabelInstruction(TR::InstOpCode::label, node, bigLoopLabel, cg);

		// Secondary unrolled vectorized loop with larget vector lengths

x86: Improve String.hashCode for medium/long strings #22875

Are you sure you want to change the base?

x86: Improve String.hashCode for medium/long strings #22875

Uh oh!

Conversation

BradleyWood commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BradleyWood commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xdaryl commented Nov 13, 2025

Uh oh!

0xdaryl commented Nov 14, 2025

Uh oh!

BradleyWood commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xdaryl commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BradleyWood commented Nov 19, 2025 •

edited

Loading