-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8371716: C2: Phi node fails Value()'s verification when speculative types clash #28331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/contributor add rwestrel For the reproducer, now test. |
|
👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@marc-chevalier Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
@marc-chevalier The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
/contributor add @rwestrel |
|
@marc-chevalier |
Webrevs
|
|
@marc-chevalier I tried to review the issue at hand, but it is a little tricky because the regression test uses one set of class names and your description another (valhalla reproducer). Would it be possible to adjust the explanation to your mainline reproducer? Or maybe even just annotating your mainline reproducer with comments would help already. I think it would be easier to review the PR once it is easy to follow the example. Also: the issue name should be adjusted to say something a bit more specific about the issue. We have a lot of "missed optimization opportunity" issues, and it would be nice to be able to tell them apart easily ;) |
| if (ft->speculative() == nullptr && t->speculative() != nullptr) { | ||
| const Type* first_ft = ft; | ||
| ft = t->filter_speculative(first_ft); | ||
| #ifdef ASSERT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a flyby rather than a review but I was wondering if it would make sense to extract this assert block since it is the same as the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We surely can. But I'd rather avoid premature clean up as it is very uncertain to me how this issue will evolve, and maybe fundamentally change.
|
I've edited the description. I'd nevertheless suggest not to care too much about the test not to get sidetracked into details that are irrelevant to the issue. I actually regret I didn't write that symbolically. |
test/hotspot/jtreg/compiler/igvn/ClashingSpeculativeTypePhiNode.java
Outdated
Show resolved
Hide resolved
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I now had a closer look, and got a little confused in multiple places. I'm not sure I have the right mental model yet.
I was also a little surprised by the fact that we do have inconsistent profiling. But maybe that should not be surprising? That's where a closer look at the reproducer would help me follow, and annotations in the test itself would be fantastic ;)
I have not yet looked at all the debug code in-depth, as I'd like to first understand the rest.
|
|
||
| // In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection | ||
| // (such as AnyNull), which is removed in cleanup_speculative. From that `ft` has empty speculative type. After the end | ||
| // of the current `Value` call, `ft` (that is returned) is becoming `_type`. If verification happens then, `t` would be the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // of the current `Value` call, `ft` (that is returned) is becoming `_type`. If verification happens then, `t` would be the | |
| // of the current `Value` call, `_type` is assigned the value of `ft`. If verification happens then, `t` would be the |
"becoming" is a bit ambiguous here, I first thought it meant ft = _type, then realized one could also read it as _type = ft. Maybe you have an even better way to express it.
|
|
||
| // In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection | ||
| // (such as AnyNull), which is removed in cleanup_speculative. From that `ft` has empty speculative type. After the end | ||
| // of the current `Value` call, `ft` (that is returned) is becoming `_type`. If verification happens then, `t` would be the | ||
| // same (union of input types), but the new `_type` has now no speculative type, the result of `t->filter_speculative(_type)` | ||
| // has the speculative type of `t` (if it's not removed because e.g. the resulting type is exact and non null) and not empty | ||
| // (like the previously returned type). In such a case, doing the filtering one time more allows to reach a fixpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From that
fthas empty speculative type
I'm not very familiar with speculative types. Does "empty speculative" == TOP speculative type? Or rather "no speculative type", which essencially means it is BOTTOM type?
Because then if we filter x with TOP we should still get TOP, but if we filter with BOTTOM we get x. And that would fit better with your statement later on:
but the new
_typehas now no speculative type, the result oft->filter_speculative(_type)has the speculative type oft
Can you clarify please for my understanding? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And does cleanup_speculative happen during t->filter_speculative(_type), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a minimal example would help here. I'm thinking something like this:
In rare cases, `_type` and `t` have incompatible opinion on speculative type, resulting into a too small intersection
t: Object (A)
_type: Object (B)
We filter them. Since A and B have no intersection, the speculative type is removed. This means the speculative type is implicitly "Object", and not TOP, as the intersection of A and B would suggest.
ft = t->filter_speculative(_type) = Object
After PhiNode::Value, we assign _type = ft. During verification, we run PhiNode::Value again, but this time:
t: Object (A) // same as above
_type: Object // ft from above
And now we get:
ft = t->filter_speculative(_type) = Object (A)
... argue about fixpoint next ...
It is a first proposal, and a little verbose... maybe you could find something more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have no speculative type, it means we don't have a guess about what the type could be (more precisely than the actual type). You can say it's bottom, or that it's the same as the non-speculative type. And indeed, if you do t->filter_speculative(_type) when _type has no speculative type, the result has the same speculative type as t except when this one doesn't already pass cleanup_speculative, but then, it's still a fixpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much value in your "minimal example". Isn't it just naming "A" and "B" instead of saying "speculative type of t" and "speculative type of _type"? I'm not fan of that: that just introduce more symbols, more names that are not in the code.
Also I'm very careful with examples, they are often misleading by giving a feeling that it's exactly how it happens. For instance Since A and B have no intersection is not always true, it just needs the intersection to be above centerline, a small enough intersection. I might rephrase later, but it seems you got what is happening and I first need a decision on the solution before spending time in cleanup that might be entirely erased. It seems that the fixpoint way was far from consensual.
| assert(ft == Type::TOP, ""); // Canonical empty value | ||
| } | ||
|
|
||
| else { | ||
|
|
||
| if (jt != ft && jt->base() == ft->base()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting looks a bit funny here.
| const Type* first_ft = ft; | ||
| ft = t->filter_speculative(first_ft); | ||
| #ifdef ASSERT | ||
| // The following logic has been moved into TypeOopPtr::filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this mean? What logic are you referring to? The one here? But then you say it was moved to TypeOopPtr::filter ...but it is still here? Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are you saying it is moved "from" rather than "into", i.e. this is some sort of code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #ifdef ASSERT | ||
| // The following logic has been moved into TypeOopPtr::filter. | ||
| const Type* jt = t->join_speculative(first_ft); | ||
| if (jt->empty()) { // Emptied out??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (jt->empty()) { // Emptied out??? | |
| if (jt->empty()) { |
Comment seems redundant.
| * -XX:CompileOnly=compiler.igvn.ClashingSpeculativeTypePhiNode::test1 | ||
| * -XX:CompileCommand=quiet | ||
| * -XX:TypeProfileLevel=222 | ||
| * -XX:+AlwaysIncrementalInline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * -XX:+AlwaysIncrementalInline | |
| * -XX:+AlwaysIncrementalInline |
GitHub Actions is giving us this:
Error: VM option 'AlwaysIncrementalInline' is develop and is available only in debug version of VM.
Improperly specified VM option 'AlwaysIncrementalInline'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Probably this would help: -XX:+IgnoreUnrecognizedVMOptions
|
I think reading the code and the comments to understand the situation might not be as good as reading the description of this PR. I regret I gave a reproducer and proposed a solution. Given the very obvious lack of consensus on the Valhalla PR, it is clear that this issue might evolve radically and that the proposed solution may not be the final one. Therefore, I will not do any cleanup before agreeing on the way to go, as it might very well be erased and it would be a very poor use of everybody's time. |
|
@marc-chevalier @merykitty @rwestrel I only just realized that the conversation from openjdk/valhalla#1717 still had relevance here. I had a very nice call with Marc, and he graciously explained things to me. I'll summarize my thoughts below. It seems to be possible that the local speculative type of the My question was if it is sane that we get this kind of inconsistent speculation: it must mean that our profiling has delivered somewhat inconsistent results. We would have to dig a bit deeper into the reproducer now, but it seems that an inlined (inner) method found one type during profiling, but the caller (outer method) found another at the phi. It would be interesting to understand a bit more what this implies. I could see these options:
We could now dig deep into if Roland's The question is now what we should do here. There are at least 3 options:
There is a bit of a question how much of an edge case this really is. Probably quite. But we also cannot rely on our fuzzers here, as they don't really produce Object Classes at all. Anyway: I'm the least qualified compared to @merykitty and @rwestrel , but I would vote for Marc's current proposal. Plus the required code cleanups, of course. |
|
I'm not sure if I'm correct, but I think speculative types themselves may not be consistent. For example, if they are consistent, then you will expect that the profiled types of the return values of a method Additionally, in the test case, the speculative type being empty is correct, the path is speculatively unreachable, maybe we can use that information to cut off the branches, simplify the CFG for better compilation? |
Attached is another test case that reproduces the same issue. I run that one with: $ for i in It usually fails after a few runs. That one has conflicting profile data but no dead path. What you're suggesting has some risk and unclear benefits so I think would need to be investigated separately. |
|
@rwestrel @merykitty @marc-chevalier Suggestion:
In future RFE's we could consider:
I suppose it depends on what our definition of "consistent" is for speculative types. Of course they can be wrong at runtime, that is the whole point of speculation, so in a sense that makes them "inconsistent". But we could at least enforce some "consistency" in the computation that we do with the speculative types: intersections and unions of types should be done "consistently", I suppose. I'm trying to reconstruct Quan-Anh's definition of "consistent":
Does this kind of problem not also arise for non-speculative types, at least during CCP or IGVN? First I'm thinking of Casts, but maybe those are special. They can have narrower type than their input type. What about Phi nodes? During IGVN, do we always expect the input types to be a subtype of the output type? I guess so? Observation: "widening" like that what happens here because of So according to Quan-Anh's definition, we start with an inconsistent speculative type. But would it not be nice if it eventually became consistent, specifically after CCP/IGVN. And we could already get local consistency after every |
I might have missed something, but which solution would you then include for now? |
The one you have now: one more round of filtering to get to fixpoint ;) |
Just a side node: We remove the speculative types after incremental inlining here: jdk/src/hotspot/share/opto/compile.cpp Lines 2359 to 2363 in cc5b35b
|
|
Interesting, and indeed, in what I know, we crash inside the |
This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway.
The proposed solution seems somewhat controversial. See openjdk/valhalla#1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go.
Analysis
Obervationally
IGVN
During IGVN, in
PhiNode::Value, aPhiNodehas 2 inputs. Their types are:We compute the join (HS' meet):
jdk/src/hotspot/share/opto/cfgnode.cpp
Lines 1299 to 1306 in 09b25cd
But the current
_type(of thePhiNodeas aTypeNode) isWe filter
tby_typejdk/src/hotspot/share/opto/cfgnode.cpp
Line 1321 in 09b25cd
and we get
which is what we return. After the end of
Value, the returned becomes the newPhiNode's_type.jdk/src/hotspot/share/opto/phaseX.cpp
Lines 2150 to 2164 in 09b25cd
and
jdk/src/hotspot/share/opto/node.cpp
Lines 1117 to 1123 in 09b25cd
Verification
On verification,
in(1),in(2)have the same value, so doest. But this timeand so after filtering
tby (new)_typeand we getwhich is retuned. Verification gets angry because the new
ftis not the same as the previous one.But why?!
Details on type computation
In short, we are doing
and observing that
ft != ft'. It seems our lattice doesn't ensure(a /\ b) /\ b = a /\ bwhich is problematic for this kind of verfication that will just "try again and see if something change".To me, the surprising fact was that the intersection
What happened to the speculative type? Both
C1andC2are inheritingObject. So the code correctly find that the intersection of these speculative type isThe interesting part is that it's
AnyNull: indeed, what else is aC1andC2at the same time? And then,above_centerlinedecides it's not useful enough (too precise, too close from HS' top/normal bottom) and remove the speculative type.jdk/src/hotspot/share/opto/type.cpp
Lines 2747 to 2749 in 09b25cd
But on the verification run,
java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *)is intersected with the speculative type ofjava/lang/Object *, which is unknown (HS' bottom/normal top), so we are simply gettingC2. If we did not discardAnyNullusingabove_centerline, we would have the intersection ofC2andAnyNull, givingAnyNull, which is indeed stable.Reproducing
In Valhalla
This crash is quite rare because:
PhiNode::Valueis called a second time, it will stabilize the_typefield before verification.To limitate the influence of 2., I've tested with an additional assert that would immediately do
in
PhiNode::Valueand compareftandft_. Indeed, we are never sure a run ofValueis not the last one: it should always be legal to stop anywhere (even if in a particular case), it was going to run further.With this extra check, the crash a bit more common, but still pretty rare. Tests that have been witness to crash then at least once:
compiler/valhalla/inlinetypes/TestCallingConvention.javacompiler/valhalla/inlinetypes/TestIntrinsics.javacompiler/valhalla/inlinetypes/TestArrays.javacompiler/valhalla/inlinetypes/TestBasicFunctionality.javaAll in
compiler/valhalla/inlinetypeswhile I was also testing with mainline tests. Suspicious, uh.In mainline
With the aforementioned extra check, I've tried to see if it could happen on mainline since the involved code seems not to be valhalla-specific. Yet, nothing failed.
Fortunately, Roland crafted an example that reproduces in mainline (and Valhalla)! You'll find it as a test here.
Fixing?
I think changing the type system would be quite risky: it is all over the place. Also, fixing would require not to drop the speculative type when
above_centerline. This might not be desirable. On top of the complexity and the associated risk, a too specific speculative type is rather useless. If we keep the too specific type around, we probably should ignore it where we make use of it. That's distributing the effort and open the door to inconsistencies. If we should ignore it, we might just as well drop it immediately. It is also dubious whether ordering requirement are meaningful for speculative type: they are not sound or complete. Moreover, one could argue that in the abstract, we don't even need a lattice or anything like that. A single poset whose functions are sound approximation of the concrete is enough. It is not uncommon that in the abstract world thata /\ bis not smaller thana.For instance, in the co-interval domain (the whole universe
Eminus an interval), if we takea = E \ [1, 2]andb = E \ [3, 4], the concrete intersection would beE \ [1, 2] \ [3, 4]which is not allowed in our domain since it has 2 holes. If we want a sound approximation of that, we must remove at least one hole. We can then takeE \ [1, 2] = a,E \ [3, 4] = bor evenE(and of course, a lot of other things, with smaller holes thanaorb). Whichever our abstract domain chooses, there is never botha /\ b < aanda /\ b < b. Indeed, this poset (like many real-world domains) is not a lattice, which doesn't keep us from speaking about soundness. An interesting and related fact is that there is no best abstraction ofE \ [1, 2] \ [3, 4].Maybe this digression about soundness rather hints that we should not compare speculative types during verification.
Finally, as a simple solution, one could simply run
filter_speculativetwice, that should be enough as the second filter will simply select the non empty speculative type if there is only one, and this one won't beabove_centerline, or it would not exist as a speculative type already. To try to be a bit less aggressive, we can rather do that in case where we know it cannot be useful. Ifftobtained fromfilter_speculativehas no speculative type, butthas, we can know that it might be because it has been dropped, and computingt->filter_speculative(ft)could pick the speculative type oft. The speculative type can still be removed if the non-speculative type offtis exact and non null for instance, but we've still reached a fixpoint, so it's correct, but a little bit too much work. This solution means that we are basically throwing away the speculative type in_typein case of clash. That sounds not shocking to me: if we get (hopefully) more accurate information on the inputs, it seems reasonable, or at least not shocking, to give up the previous speculative type if it clashes since it's not known to be correct. One can also phrase it as "we keep the speculative type of_typeonly if we don't have anything going against it".This PR includes the last solution.
Thanks,
Marc
Progress
Issue
Contributors
<roland@openjdk.org>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331$ git checkout pull/28331Update a local copy of the PR:
$ git checkout pull/28331$ git pull https://git.openjdk.org/jdk.git pull/28331/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28331View PR using the GUI difftool:
$ git pr show -t 28331Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28331.diff
Using Webrev
Link to Webrev Comment