Skip to content

Conversation

@ehsankianifar
Copy link
Contributor

@r30shah
This PR add support for offheap 2D array allocation. It is calling helper if the second dim is zero length and allocating offheap. I have a few possible solutions for that.

@ehsankianifar ehsankianifar changed the title Z: Add support for offheap allocation in multianewarray evaluator Add support for offheap allocation in multianewarray evaluator on IBM Z platform Oct 17, 2025
@ehsankianifar ehsankianifar marked this pull request as draft October 17, 2025 15:39
@ehsankianifar
Copy link
Contributor Author

Remove WIP when I finish testing

@ehsankianifar
Copy link
Contributor Author

Local tests pass. I remove the WIP and launching jenkin tests.
The compilation logs:
offHeap.log

@ehsankianifar ehsankianifar self-assigned this Oct 17, 2025
@ehsankianifar ehsankianifar requested a review from r30shah October 17, 2025 16:53
@ehsankianifar ehsankianifar marked this pull request as ready for review October 17, 2025 16:53
Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VermaSh - Can I request your review on this first for off heap?

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhances the multianewarray evaluator to support offheap allocation on
IMB Z platform. Offheap allocation for arrays with a zero-length second
dimension is not supported; in such cases, a helper call is triggered
to handle the allocation.

Please fix typo (IBM Z not IMB Z).

I have not looked into your code - but can you confirm this statement is true or not.

  • We did inline zero-length first / second dimension array when off-heap is enabled. But changes in this PR will divert those case to helper (Or only zero length second dimension) and only inlines if both dimension is not zero?

@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch from d4792d2 to f7b0fe2 Compare October 20, 2025 13:47
@ehsankianifar
Copy link
Contributor Author

Thanks @r30shah! I fixed the commit message. Yes, we were inlining zero length 2D arrays before, then we disable inlining offheap and with this change we enable offheap except for zero length second dim.
A subsequent PR will fix inlining of zero second length arrays.

@r30shah
Copy link
Contributor

r30shah commented Oct 20, 2025

Is it that complicated to fix for zero second length that you will do in second PR?

@ehsankianifar
Copy link
Contributor Author

Is it that complicated to fix for zero second length that you will do in second PR?

No I have a few ideas but want to select a plan that has minimum impact on the non zero allocation since I expect non-zero to get called more frequently.

dependencies->addPostCondition(dim2SizeReg, TR::RealRegister::AssignAny);
cursor = generateRXInstruction(cg, TR::InstOpCode::LTGF, node, dim2SizeReg, generateS390MemoryReference(dimsPtrReg, 0, cg), cursor);
iComment("Load 2st dim length.");
// The size of zero length array is already loaded in size register. Jump over the array size calculation instructions if length is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're no longer writing 0 to dim2SizeReg, this comment can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks shubham. Comment was removed

@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch from f7b0fe2 to deaab6d Compare October 20, 2025 16:17
@ehsankianifar
Copy link
Contributor Author

@r30shah @VermaSh I pushed the changes to support zero length offheap allocation as well as non-zero length.

@ehsankianifar
Copy link
Contributor Author

ehsankianifar commented Oct 23, 2025

@hzongaro pointed that we do not inline constant non-zero second dimension 2D arrays (#22562 (review)). I am trying to see why it was added and will remove the condition if it is not necessary anymore.

@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch 2 times, most recently from f788654 to fa536c3 Compare October 24, 2025 16:29
@VermaSh
Copy link
Contributor

VermaSh commented Oct 27, 2025

@ehsankianifar Is this ready for review or are you still working on adding changes?

@ehsankianifar
Copy link
Contributor Author

@ehsankianifar Is this ready for review or are you still working on adding changes?

Thanks @VermaSh, Yes it is ready for review.

@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch from 50f14ea to 1618674 Compare October 27, 2025 18:50
@ehsankianifar
Copy link
Contributor Author

I upload a sample compilation log with constant zero length here for reference:
zeroLength.log
FYI @VermaSh

@VermaSh
Copy link
Contributor

VermaSh commented Oct 28, 2025

Thank you! @ehsankianifar. Can you please share trace logs for constant and non-constant non-zero 2nd dimension?

@ehsankianifar
Copy link
Contributor Author

Thanks @VermaSh,
The node after post lower trees is always the same regardless of the argument types and values. Here is an example of non constant length compilation log:
jitCompilation.log
The node in Pre Instruction Selection is exactly like the const zero length logs.

@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch from fb38ac1 to 1618674 Compare November 4, 2025 20:59
if (isOffHeapAllocationEnabled)
{
// Store first element's address in data address field.
cursor = generateRXInstruction(cg, TR::InstOpCode::STG, node, sizeReg, generateS390MemoryReference(resultReg, fej9->getOffsetOfContiguousDataAddrField(), cg), cursor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If leaf array size is 0, won't we completely miss initializing dataAddr slot for 1st dimension array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this code will be executed if the first dimension length is not zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, I got confused by the comments. All clear now

Copy link
Contributor

@VermaSh VermaSh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're leaving the dataAddr slot for the first-dimension array uninitialized when the leaf array has size 0.

@ehsankianifar
Copy link
Contributor Author

I think we're leaving the dataAddr slot for the first-dimension array uninitialized when the leaf array has size 0.

Thanks @VermaSh for your review. We populate the data address field in the the first dimension if there is any leaf array (first dim length is not zero). That part of code has no knowledge of the second dim length. Here is a sample I just allocated and checked the created object with first dim 1 and second dim zero (obj[1][0]). Everything looks correct:
Screenshot 2025-11-05 at 10 24 01 AM


static bool recreateRoot = feGetEnv("TR_LowerMultiANewArrayRecreateRoot") ? true : false;
#if defined(TR_HOST_POWER)
#if defined(TR_HOST_POWER) || defined(TR_HOST_S390)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a FYI, these changes are in https://github.com/eclipse-openj9/openj9/pull/22562/files as well so there might be a conflict if that PR goes in first.

Copy link
Contributor

@VermaSh VermaSh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGT from off-heap perspective

@ehsankianifar
Copy link
Contributor Author

LGT from off-heap perspective

Thank you for your review. @r30shah can we merge this PR?

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some question and one change. Based on your response to the two question I have posed I will continue review.

// Store first element's address in data address field.
cursor = generateRXInstruction(cg, TR::InstOpCode::STG, node, sizeReg, generateS390MemoryReference(resultReg, fej9->getOffsetOfContiguousDataAddrField(), cg), cursor);
// Subtract the header size from the dim2 size, so we can move to the next leaf by adding this value to the address of the leaf's first element.
cursor = generateRILInstruction(cg, TR::InstOpCode::SLFI, node, dim2SizeReg, (int32_t)TR::Compiler->om.contiguousArrayHeaderSizeInBytes(), cursor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use SLGFI here ? dim2SizeReg is used as 64-bit at all places. Is this an oversight or deliberate ? I do understand that it would be impossible to have dim2SizeReg to be greater than 32-bit so may not cause any functional issue but this would not even save us any instruction.

Copy link
Contributor Author

@ehsankianifar ehsankianifar Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know that the size is 32 bit max, I am using the upper 32 bits for storing compressed reference to the leaf array. It is not used in this code path and I can use 64bit instruction here but it may cause confusion and inconsistency along code paths.

cursor = generateRILInstruction(cg, TR::InstOpCode::SLFI, node, dim2SizeReg, (int32_t)TR::Compiler->om.contiguousArrayHeaderSizeInBytes(), cursor);
// The length of the second dimension is stored in the upper 32 bits of the scratch register.
// Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero.
cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was one of the reason originally I asked / recommended not to have values fused together before storing it. Can you explain the check here ? Lower half is zero and comparison would upperhalf with lowerhalf and upperhalf and CC is set ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the most efficient way to check if the second dim is zero without extra loading or using extra registers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be, not questioning that. I want to confirm my understanding of the logic here, so that I can continue review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misunderstood the question earlier.
The scratch register stores the first dimension length in its lower 32 bits and the second dimension length in its upper 32 bits. When using CGFR, we compare the full 64-bit value of the scratch register with its 32-bit lower half. These values match only if the upper half (second dimension length) is zero.
Currently, I don’t have a free register to split the dimension lengths, so I must either reuse the upper half of the scratch register or allocate a new register to hold the value.

// The length of the second dimension is stored in the upper 32 bits of the scratch register.
// Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero.
cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor);
// The condition code has already been set and must remain unchanged during the second dimension allocation loop in this code path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this ? Can this be avoided, in general we should avoid writing the evalutors where condition code is set lot more instructions before. This becomes difficult to debug and fix in case there comes an issue in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem Summary

The goal is to store the data start address only when the second dimension length is non-zero. I evaluated several approaches:


Approaches Considered

  1. Separate allocation loops

    • One loop for zero-length allocations and another for non-zero-length.
    • Branching is based on the second dimension length.
    • Drawbacks: Increases code size. Moving the zero-length loop to OOL may introduce regressions.
  2. Unconditional store + post-zeroing

    • Store the address regardless of length.
    • After allocation, loop through leaf arrays and zero the address if the length is zero.
    • Drawbacks: May cause performance regressions due to unnecessary stores and post-processing.
  3. Conditional store inside loop

    • Check the length in each iteration and store the address only if non-zero.
    • Drawbacks: Adds overhead due to repeated condition checks.
  4. Selected approach: Pre-loop check + conditional store

    • Check the length once before entering the loop.
    • Use a conditional store inside the loop.
    • Note: Comments were added to ensure no instructions inside the loop alter the condition code. Future contributors are expected to maintain this constraint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehsankianifar Thanks for the detailed response. I am still skeptical of this approach as I do not think a comment like this is sufficient. You set the condition code before entering a loop and then inside a loop after some n instruction, you use that CC to store.
Though given you have shared 4 approaches you have considered so far, I will take a look at the log file once again and see if I can suggest better approach.

This is the log file #22797 (comment) I am looking at. If that is not the one with latest, please upload latest one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehsankianifar - Is the log I pointed out generates the code in this block ? I do not see CGFR instruction in the log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @r30shah,
That log is to compare the tree transformation for constant length array vs non-constant array. Here is a compilation log for offHeap enabled allocation:
OffheapComp.log

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The length of the second dimension is stored in the upper 32 bits of the scratch register.
// Comparing the full 64-bit scratch register with its lower 32 bits sets the condition code to COND_BNE if the second dimension is non-zero.
cursor = generateRREInstruction(cg, TR::InstOpCode::CGFR, node, scratchReg, scratchReg, cursor);
// The condition code has already been set and must remain unchanged during the second dimension allocation loop in this code path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehsankianifar - Is the log I pointed out generates the code in this block ? I do not see CGFR instruction in the log.

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehsankianifar Can I request you to rebase this one? I am picking the review on this again.

Enhances the multianewarray evaluator to support offheap allocation on
IBM Z platform. Offheap allocation for arrays with a zero-length second
dimension is not supported; in such cases, a helper call is triggered
to handle the allocation.

signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
Enhances the multianewarray evaluator to support zero-length offheap
allocation on IBM Z platform.

signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
Previously, inlining of constant 2D arrays with a non-zero second
dimension was disabled on z due to limitations in handling such cases.
However, recent improvements have resolved those issues, making it safe
and beneficial to inline these arrays.
Remove the constraint that prevented inlining of non-zero constant
second dimension 2D arrays.

signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch 2 times, most recently from 0adb331 to 425601f Compare December 2, 2025 19:24
Refactored off-heap 2D array allocation logic to improve clarity and
maintainability on IBM Z platform.
Previously, STOCG was used to store the data address only when the
length was non-zero, relying on condition codes to control flow. This
approach introduced hidden dependencies on allocator condition codes,
making the code harder to read and prone to future bugs.
Updated the implementation to allocate non-zero length off-heap arrays
inline, while handling zero-length arrays out of line. This removes
reliance on condition codes for allocation decisions, simplifies
control flow, and ensures consistent behavior across allocation paths.

signed-off-by: Ehsan Kiani Far <ehsan.kianifar@gmail.com>
@ehsankianifar ehsankianifar force-pushed the Z_AddOffheapSupportInMultianewarrayEvaluator branch from 425601f to e3ab377 Compare December 2, 2025 19:36
@ehsankianifar
Copy link
Contributor Author

I moved the allocation of off-heap zero second dimension length leaf arrays OOL. Tests are passing locally and I started a few jenkin tests.
The compilation logs for off-heap 2D array for both comp refs and no comp refs are provided in the flowing logs files:
OffheapComp.log
OffheapNoComp.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants