-
Notifications
You must be signed in to change notification settings - Fork 778
Add support for offheap allocation in multianewarray evaluator on IBM Z platform #22797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for offheap allocation in multianewarray evaluator on IBM Z platform #22797
Conversation
|
Remove WIP when I finish testing |
|
Local tests pass. I remove the WIP and launching jenkin tests. |
r30shah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VermaSh - Can I request your review on this first for off heap?
r30shah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhances the multianewarray evaluator to support offheap allocation on
IMB Z platform. Offheap allocation for arrays with a zero-length second
dimension is not supported; in such cases, a helper call is triggered
to handle the allocation.
Please fix typo (IBM Z not IMB Z).
I have not looked into your code - but can you confirm this statement is true or not.
- We did inline zero-length first / second dimension array when off-heap is enabled. But changes in this PR will divert those case to helper (Or only zero length second dimension) and only inlines if both dimension is not zero?
d4792d2 to
f7b0fe2
Compare
|
Thanks @r30shah! I fixed the commit message. Yes, we were inlining zero length 2D arrays before, then we disable inlining offheap and with this change we enable offheap except for zero length second dim. |
|
Is it that complicated to fix for zero second length that you will do in second PR? |
No I have a few ideas but want to select a plan that has minimum impact on the non zero allocation since I expect non-zero to get called more frequently. |
| dependencies->addPostCondition(dim2SizeReg, TR::RealRegister::AssignAny); | ||
| cursor = generateRXInstruction(cg, TR::InstOpCode::LTGF, node, dim2SizeReg, generateS390MemoryReference(dimsPtrReg, 0, cg), cursor); | ||
| iComment("Load 2st dim length."); | ||
| // The size of zero length array is already loaded in size register. Jump over the array size calculation instructions if length is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're no longer writing 0 to dim2SizeReg, this comment can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks shubham. Comment was removed
f7b0fe2 to
deaab6d
Compare
|
@hzongaro pointed that we do not inline constant non-zero second dimension 2D arrays (#22562 (review)). I am trying to see why it was added and will remove the condition if it is not necessary anymore. |
f788654 to
fa536c3
Compare
|
@ehsankianifar Is this ready for review or are you still working on adding changes? |
Thanks @VermaSh, Yes it is ready for review. |
50f14ea to
1618674
Compare
|
I upload a sample compilation log with constant zero length here for reference: |
|
Thank you! @ehsankianifar. Can you please share trace logs for constant and non-constant non-zero 2nd dimension? |
|
Thanks @VermaSh, |
fb38ac1 to
1618674
Compare
| if (isOffHeapAllocationEnabled) | ||
| { | ||
| // Store first element's address in data address field. | ||
| cursor = generateRXInstruction(cg, TR::InstOpCode::STG, node, sizeReg, generateS390MemoryReference(resultReg, fej9->getOffsetOfContiguousDataAddrField(), cg), cursor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If leaf array size is 0, won't we completely miss initializing dataAddr slot for 1st dimension array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this code will be executed if the first dimension length is not zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, I got confused by the comments. All clear now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're leaving the dataAddr slot for the first-dimension array uninitialized when the leaf array has size 0.
Thanks @VermaSh for your review. We populate the data address field in the the first dimension if there is any leaf array (first dim length is not zero). That part of code has no knowledge of the second dim length. Here is a sample I just allocated and checked the created object with first dim 1 and second dim zero ( |
|
|
||
| static bool recreateRoot = feGetEnv("TR_LowerMultiANewArrayRecreateRoot") ? true : false; | ||
| #if defined(TR_HOST_POWER) | ||
| #if defined(TR_HOST_POWER) || defined(TR_HOST_S390) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a FYI, these changes are in https://github.com/eclipse-openj9/openj9/pull/22562/files as well so there might be a conflict if that PR goes in first.
VermaSh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGT from off-heap perspective
Thank you for your review. @r30shah can we merge this PR? |
r30shah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some question and one change. Based on your response to the two question I have posed I will continue review.
| // Store first element's address in data address field. | ||
| cursor = generateRXInstruction(cg, TR::InstOpCode::STG, node, sizeReg, generateS390MemoryReference(resultReg, fej9->getOffsetOfContiguousDataAddrField(), cg), cursor); | ||
| // Subtract the header size from the dim2 size, so we can move to the next leaf by adding this value to the address of the leaf's first element. | ||
| cursor = generateRILInstruction(cg, TR::InstOpCode::SLFI, node, dim2SizeReg, (int32_t)TR::Compiler->om.contiguousArrayHeaderSizeInBytes(), cursor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use SLGFI here ? dim2SizeReg is used as 64-bit at all places. Is this an oversight or deliberate ? I do understand that it would be impossible to have dim2SizeReg to be greater than 32-bit so may not cause any functional issue but this would not even save us any instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know that the size is 32 bit max, I am using the upper 32 bits for storing compressed reference to the leaf array. It is not used in this code path and I can use 64bit instruction here but it may cause confusion and inconsistency along code paths.
| cursor = generateRILInstruction(cg, TR::InstOpCode::SLFI, node, dim2SizeReg, (int32_t)TR::Compiler->om.contiguousArrayHeaderSizeInBytes(), cursor); | ||
| // The length of the second dimension is stored in the upper 32 bits of the scratch register. | ||
| // Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero. | ||
| cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was one of the reason originally I asked / recommended not to have values fused together before storing it. Can you explain the check here ? Lower half is zero and comparison would upperhalf with lowerhalf and upperhalf and CC is set ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the most efficient way to check if the second dim is zero without extra loading or using extra registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be, not questioning that. I want to confirm my understanding of the logic here, so that I can continue review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I misunderstood the question earlier.
The scratch register stores the first dimension length in its lower 32 bits and the second dimension length in its upper 32 bits. When using CGFR, we compare the full 64-bit value of the scratch register with its 32-bit lower half. These values match only if the upper half (second dimension length) is zero.
Currently, I don’t have a free register to split the dimension lengths, so I must either reuse the upper half of the scratch register or allocate a new register to hold the value.
| // The length of the second dimension is stored in the upper 32 bits of the scratch register. | ||
| // Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero. | ||
| cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor); | ||
| // The condition code has already been set and must remain unchanged during the second dimension allocation loop in this code path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this ? Can this be avoided, in general we should avoid writing the evalutors where condition code is set lot more instructions before. This becomes difficult to debug and fix in case there comes an issue in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem Summary
The goal is to store the data start address only when the second dimension length is non-zero. I evaluated several approaches:
Approaches Considered
-
Separate allocation loops
- One loop for zero-length allocations and another for non-zero-length.
- Branching is based on the second dimension length.
- Drawbacks: Increases code size. Moving the zero-length loop to OOL may introduce regressions.
-
Unconditional store + post-zeroing
- Store the address regardless of length.
- After allocation, loop through leaf arrays and zero the address if the length is zero.
- Drawbacks: May cause performance regressions due to unnecessary stores and post-processing.
-
Conditional store inside loop
- Check the length in each iteration and store the address only if non-zero.
- Drawbacks: Adds overhead due to repeated condition checks.
-
Selected approach: Pre-loop check + conditional store
- Check the length once before entering the loop.
- Use a conditional store inside the loop.
- Note: Comments were added to ensure no instructions inside the loop alter the condition code. Future contributors are expected to maintain this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehsankianifar Thanks for the detailed response. I am still skeptical of this approach as I do not think a comment like this is sufficient. You set the condition code before entering a loop and then inside a loop after some n instruction, you use that CC to store.
Though given you have shared 4 approaches you have considered so far, I will take a look at the log file once again and see if I can suggest better approach.
This is the log file #22797 (comment) I am looking at. If that is not the one with latest, please upload latest one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehsankianifar - Is the log I pointed out generates the code in this block ? I do not see CGFR instruction in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @r30shah,
That log is to compare the tree transformation for constant length array vs non-constant array. Here is a compilation log for offHeap enabled allocation:
OffheapComp.log
r30shah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The length of the second dimension is stored in the upper 32 bits of the scratch register. | ||
| // Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero. | ||
| cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor); | ||
| // The condition code has already been set and must remain unchanged during the second dimension allocation loop in this code path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehsankianifar - Is the log I pointed out generates the code in this block ? I do not see CGFR instruction in the log.
r30shah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehsankianifar Can I request you to rebase this one? I am picking the review on this again.
Enhances the multianewarray evaluator to support offheap allocation on IBM Z platform. Offheap allocation for arrays with a zero-length second dimension is not supported; in such cases, a helper call is triggered to handle the allocation. signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
Enhances the multianewarray evaluator to support zero-length offheap allocation on IBM Z platform. signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
Previously, inlining of constant 2D arrays with a non-zero second dimension was disabled on z due to limitations in handling such cases. However, recent improvements have resolved those issues, making it safe and beneficial to inline these arrays. Remove the constraint that prevented inlining of non-zero constant second dimension 2D arrays. signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
0adb331 to
425601f
Compare
Refactored off-heap 2D array allocation logic to improve clarity and maintainability on IBM Z platform. Previously, STOCG was used to store the data address only when the length was non-zero, relying on condition codes to control flow. This approach introduced hidden dependencies on allocator condition codes, making the code harder to read and prone to future bugs. Updated the implementation to allocate non-zero length off-heap arrays inline, while handling zero-length arrays out of line. This removes reliance on condition codes for allocation decisions, simplifies control flow, and ensures consistent behavior across allocation paths. signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
425601f to
e3ab377
Compare
|
I moved the allocation of off-heap zero second dimension length leaf arrays OOL. Tests are passing locally and I started a few jenkin tests. |

@r30shah
This PR add support for offheap 2D array allocation. It is calling helper if the second dim is zero length and allocating offheap. I have a few possible solutions for that.