[DAGCombiner] Relax nsz constraint for FP optimizations #165011

guy-david · 2025-10-24T16:41:45Z

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guaranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

fadd x, +0.0 -> x
fsub x, -0.0 -> x
fsub +0.0, x -> fneg x
fdiv(x, sqrt(x)) -> sqrt(x)
frem lowering with power-of-2 divisors

llvmbot · 2025-10-24T16:42:25Z

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

fadd x, -0.0 -> x
fsub x, 0.0 -> x
fsub -0.0, x -> fneg x
fdiv x, sqrt(x) -> sqrt(x)
frem lowering with power-of-2 divisors

Full diff: https://github.com/llvm/llvm-project/pull/165011.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+11-5)
(added) llvm/test/CodeGen/AArch64/nsz-bypass.ll (+72)
(modified) llvm/test/CodeGen/AMDGPU/swdev380865.ll (+2-3)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 73aed33fe0838..f2b4e74cc65c7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17781,7 +17781,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-    if (N1C->isNegative() || Flags.hasNoSignedZeros())
+    if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0)))
       return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -17993,7 +17994,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       return N0;
     }
   }
@@ -18006,7 +18008,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-    if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
       // flushed to zero, unless all users treat denorms as zero (DAZ).
       // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18654,7 +18657,9 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() ||
+       DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) &&
+      Flags.hasAllowReassociation())
     if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
       return N1;
 
@@ -18706,7 +18711,8 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
       TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
       DAG.isKnownToBeAPowerOfTwoFP(N1)) {
     bool NeedsCopySign =
-        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
+        !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
     SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
     SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
     SDValue MLA;
diff --git a/llvm/test/CodeGen/AArch64/nsz-bypass.ll b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
new file mode 100644
index 0000000000000..3b17e410ac380
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 | FileCheck %s
+
+; Test that nsz constraint can be bypassed when all uses are sign-insensitive.
+
+define i1 @test_fadd_neg_zero_fcmp(float %x) {
+; CHECK-LABEL: test_fadd_neg_zero_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #1.00000000
+; CHECK-NEXT:    fcmp s0, s1
+; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    ret
+  %add = fadd float %x, -0.0
+  %cmp = fcmp oeq float %add, 1.0
+  ret i1 %cmp
+}
+
+define float @test_fsub_zero_fabs(float %x) {
+; CHECK-LABEL: test_fsub_zero_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %sub = fsub float %x, 0.0
+  %abs = call float @llvm.fabs.f32(float %sub)
+  ret float %abs
+}
+
+define float @test_fsub_neg_zero_copysign(float %x, float %y) {
+; CHECK-LABEL: test_fsub_neg_zero_copysign:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mvni v2.4s, #128, lsl #24
+; CHECK-NEXT:    // kill: def $s0 killed $s0 def $q0
+; CHECK-NEXT:    // kill: def $s1 killed $s1 def $q1
+; CHECK-NEXT:    bif v0.16b, v1.16b, v2.16b
+; CHECK-NEXT:    // kill: def $s0 killed $s0 killed $q0
+; CHECK-NEXT:    ret
+  %sub = fsub float -0.0, %x
+  %copysign = call float @llvm.copysign.f32(float %sub, float %y)
+  ret float %copysign
+}
+
+define i1 @test_div_sqrt_fcmp(float %x) {
+; CHECK-LABEL: test_div_sqrt_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fsqrt s0, s0
+; CHECK-NEXT:    fcmp s0, #0.0
+; CHECK-NEXT:    cset w0, gt
+; CHECK-NEXT:    ret
+  %sqrt = call float @llvm.sqrt.f32(float %x)
+  %div = fdiv reassoc float %x, %sqrt
+  %cmp = fcmp ogt float %div, 0.0
+  ret i1 %cmp
+}
+
+define float @test_frem_fabs(float %x) {
+; CHECK-LABEL: test_frem_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #0.50000000
+; CHECK-NEXT:    fmov s2, #-2.00000000
+; CHECK-NEXT:    fmul s1, s0, s1
+; CHECK-NEXT:    frintz s1, s1
+; CHECK-NEXT:    fmadd s0, s1, s2, s0
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %rem = frem float %x, 2.0
+  %abs = call float @llvm.fabs.f32(float %rem)
+  ret float %abs
+}
+
+declare float @llvm.fabs.f32(float)
+declare float @llvm.copysign.f32(float, float)
+declare float @llvm.sqrt.f32(float)
diff --git a/llvm/test/CodeGen/AMDGPU/swdev380865.ll b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
index d4a8a0d762afd..1130c465c15e3 100644
--- a/llvm/test/CodeGen/AMDGPU/swdev380865.ll
+++ b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
@@ -28,14 +28,13 @@ define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce)
 ; CHECK-NEXT:    v_mov_b32_e32 v1, s7
 ; CHECK-NEXT:  .LBB0_1: ; %for.cond4.preheader
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], 0
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40140000
-; CHECK-NEXT:    s_add_i32 s1, s1, s0
-; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40180000
+; CHECK-NEXT:    s_add_i32 s1, s1, s0
+; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x401c0000

github-actions · 2025-10-24T16:43:36Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jayfoad · 2025-11-12T11:47:09Z

The optimizations affected:

fadd x, -0.0 -> x

fsub x, 0.0 -> x

fsub -0.0, x -> fneg x

IIUC these cases are always allowed. Your patch enables:

fadd x, +0.0 -> x
fsub x, -0.0 -> x
fsub +0.0, x -> fneg x

guy-david · 2025-11-12T14:34:04Z

Right, thank you for the correction.

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled. However, this flag is not a hard requirement when all of the users of the combined value are either guaranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.). The optimizations affected: - fadd x, +0.0 -> x - fsub x, -0.0 -> x - fsub +0.0, x -> fneg x - fdiv(x, sqrt(x)) -> sqrt(x) - frem lowering with power-of-2 divisors

guy-david requested a review from arsenm October 24, 2025 16:41

llvmbot added backend:AArch64 backend:AMDGPU llvm:SelectionDAG SelectionDAGISel as well labels Oct 24, 2025

guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch 2 times, most recently from 5148b0d to 485c176 Compare November 12, 2025 10:54

guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from 485c176 to a508708 Compare November 12, 2025 14:33

guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from a508708 to c491e1c Compare November 17, 2025 11:04

guy-david requested review from DeinAlptraum, Endilll, aaupov, andykaylor, ayermolo, bcardosolopes, cyndyishida, lanza, maksfb, paschalis-mpeis, rafaelauler, xlauko, yota9 and yozhu as code owners November 17, 2025 11:04

guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from c491e1c to 01e872d Compare November 17, 2025 11:05

guy-david changed the base branch from users/guy-david/dag-combine-fp-to-int-to-fp-sign-insensitive to main November 17, 2025 11:05

guy-david removed request for aaupov and paschalis-mpeis November 17, 2025 11:05

guy-david removed request for DeinAlptraum, Endilll, andykaylor, ayermolo, bcardosolopes, cyndyishida, lanza, maksfb, rafaelauler, xlauko, yota9 and yozhu November 17, 2025 11:05

guy-david changed the title ~~[DAGCombiner] Relax nsz constraint for more FP optimizations~~ [DAGCombiner] Relax nsz constraint for FP optimizations Nov 17, 2025

guy-david mentioned this pull request Nov 17, 2025

[DAGCombiner] Relax nsz constraint with fp->int->fp optimizations #164503

Open

guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from 01e872d to 3b586e0 Compare November 17, 2025 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAGCombiner] Relax nsz constraint for FP optimizations #165011

[DAGCombiner] Relax nsz constraint for FP optimizations #165011

guy-david commented Oct 24, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 24, 2025 •

edited

Loading

Uh oh!

jayfoad commented Nov 12, 2025

Uh oh!

guy-david commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[DAGCombiner] Relax nsz constraint for FP optimizations #165011

Are you sure you want to change the base?

[DAGCombiner] Relax nsz constraint for FP optimizations #165011

Conversation

guy-david commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Nov 12, 2025

Uh oh!

guy-david commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guy-david commented Oct 24, 2025 •

edited

Loading

llvmbot commented Oct 24, 2025 •

edited

Loading

github-actions bot commented Oct 24, 2025 •

edited

Loading