Skip to content

Conversation

@guy-david
Copy link
Contributor

@guy-david guy-david commented Oct 24, 2025

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guaranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

  • fadd x, +0.0 -> x
  • fsub x, -0.0 -> x
  • fsub +0.0, x -> fneg x
  • fdiv(x, sqrt(x)) -> sqrt(x)
  • frem lowering with power-of-2 divisors

@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2025

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

[DAGCombiner] Relax nsz constraint for more FP optimizations

Some floating-point optimization don't trigger because they can produce incorrect results around signed zeros, and rely on the existence of the nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of the combined value are either guranteed to overwrite the sign-bit or simply ignore it (comparisons, etc.).

The optimizations affected:

  • fadd x, -0.0 -> x
  • fsub x, 0.0 -> x
  • fsub -0.0, x -> fneg x
  • fdiv x, sqrt(x) -> sqrt(x)
  • frem lowering with power-of-2 divisors

Full diff: https://github.com/llvm/llvm-project/pull/165011.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+11-5)
  • (added) llvm/test/CodeGen/AArch64/nsz-bypass.ll (+72)
  • (modified) llvm/test/CodeGen/AMDGPU/swdev380865.ll (+2-3)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 73aed33fe0838..f2b4e74cc65c7 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17781,7 +17781,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-    if (N1C->isNegative() || Flags.hasNoSignedZeros())
+    if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0)))
       return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -17993,7 +17994,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       return N0;
     }
   }
@@ -18006,7 +18008,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-    if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+    if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+        DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) {
       // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
       // flushed to zero, unless all users treat denorms as zero (DAZ).
       // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18654,7 +18657,9 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() ||
+       DAG.allUsesSignedZeroInsensitive(SDValue(N, 0))) &&
+      Flags.hasAllowReassociation())
     if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
       return N1;
 
@@ -18706,7 +18711,8 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
       TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
       DAG.isKnownToBeAPowerOfTwoFP(N1)) {
     bool NeedsCopySign =
-        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+        !Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0) &&
+        !DAG.allUsesSignedZeroInsensitive(SDValue(N, 0));
     SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
     SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
     SDValue MLA;
diff --git a/llvm/test/CodeGen/AArch64/nsz-bypass.ll b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
new file mode 100644
index 0000000000000..3b17e410ac380
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/nsz-bypass.ll
@@ -0,0 +1,72 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=aarch64 | FileCheck %s
+
+; Test that nsz constraint can be bypassed when all uses are sign-insensitive.
+
+define i1 @test_fadd_neg_zero_fcmp(float %x) {
+; CHECK-LABEL: test_fadd_neg_zero_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #1.00000000
+; CHECK-NEXT:    fcmp s0, s1
+; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    ret
+  %add = fadd float %x, -0.0
+  %cmp = fcmp oeq float %add, 1.0
+  ret i1 %cmp
+}
+
+define float @test_fsub_zero_fabs(float %x) {
+; CHECK-LABEL: test_fsub_zero_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %sub = fsub float %x, 0.0
+  %abs = call float @llvm.fabs.f32(float %sub)
+  ret float %abs
+}
+
+define float @test_fsub_neg_zero_copysign(float %x, float %y) {
+; CHECK-LABEL: test_fsub_neg_zero_copysign:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mvni v2.4s, #128, lsl #24
+; CHECK-NEXT:    // kill: def $s0 killed $s0 def $q0
+; CHECK-NEXT:    // kill: def $s1 killed $s1 def $q1
+; CHECK-NEXT:    bif v0.16b, v1.16b, v2.16b
+; CHECK-NEXT:    // kill: def $s0 killed $s0 killed $q0
+; CHECK-NEXT:    ret
+  %sub = fsub float -0.0, %x
+  %copysign = call float @llvm.copysign.f32(float %sub, float %y)
+  ret float %copysign
+}
+
+define i1 @test_div_sqrt_fcmp(float %x) {
+; CHECK-LABEL: test_div_sqrt_fcmp:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fsqrt s0, s0
+; CHECK-NEXT:    fcmp s0, #0.0
+; CHECK-NEXT:    cset w0, gt
+; CHECK-NEXT:    ret
+  %sqrt = call float @llvm.sqrt.f32(float %x)
+  %div = fdiv reassoc float %x, %sqrt
+  %cmp = fcmp ogt float %div, 0.0
+  ret i1 %cmp
+}
+
+define float @test_frem_fabs(float %x) {
+; CHECK-LABEL: test_frem_fabs:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fmov s1, #0.50000000
+; CHECK-NEXT:    fmov s2, #-2.00000000
+; CHECK-NEXT:    fmul s1, s0, s1
+; CHECK-NEXT:    frintz s1, s1
+; CHECK-NEXT:    fmadd s0, s1, s2, s0
+; CHECK-NEXT:    fabs s0, s0
+; CHECK-NEXT:    ret
+  %rem = frem float %x, 2.0
+  %abs = call float @llvm.fabs.f32(float %rem)
+  ret float %abs
+}
+
+declare float @llvm.fabs.f32(float)
+declare float @llvm.copysign.f32(float, float)
+declare float @llvm.sqrt.f32(float)
diff --git a/llvm/test/CodeGen/AMDGPU/swdev380865.ll b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
index d4a8a0d762afd..1130c465c15e3 100644
--- a/llvm/test/CodeGen/AMDGPU/swdev380865.ll
+++ b/llvm/test/CodeGen/AMDGPU/swdev380865.ll
@@ -28,14 +28,13 @@ define amdgpu_kernel void @_Z6kernelILi4000ELi1EEvPd(ptr addrspace(1) %x.coerce)
 ; CHECK-NEXT:    v_mov_b32_e32 v1, s7
 ; CHECK-NEXT:  .LBB0_1: ; %for.cond4.preheader
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], 0
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40140000
-; CHECK-NEXT:    s_add_i32 s1, s1, s0
-; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x40180000
+; CHECK-NEXT:    s_add_i32 s1, s1, s0
+; CHECK-NEXT:    s_cmpk_lt_i32 s1, 0xa00
 ; CHECK-NEXT:    v_add_f64 v[0:1], v[0:1], s[6:7]
 ; CHECK-NEXT:    s_mov_b32 s6, 0
 ; CHECK-NEXT:    s_mov_b32 s7, 0x401c0000

@github-actions
Copy link

github-actions bot commented Oct 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@guy-david guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch 2 times, most recently from 5148b0d to 485c176 Compare November 12, 2025 10:54
@jayfoad
Copy link
Contributor

jayfoad commented Nov 12, 2025

The optimizations affected:

  • fadd x, -0.0 -> x
  • fsub x, 0.0 -> x
  • fsub -0.0, x -> fneg x

IIUC these cases are always allowed. Your patch enables:

  • fadd x, +0.0 -> x
  • fsub x, -0.0 -> x
  • fsub +0.0, x -> fneg x

@guy-david guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from 485c176 to a508708 Compare November 12, 2025 14:33
@guy-david
Copy link
Contributor Author

Right, thank you for the correction.

@guy-david guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from a508708 to c491e1c Compare November 17, 2025 11:04
@guy-david guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from c491e1c to 01e872d Compare November 17, 2025 11:05
@guy-david guy-david changed the base branch from users/guy-david/dag-combine-fp-to-int-to-fp-sign-insensitive to main November 17, 2025 11:05
@guy-david guy-david changed the title [DAGCombiner] Relax nsz constraint for more FP optimizations [DAGCombiner] Relax nsz constraint for FP optimizations Nov 17, 2025
Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guaranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).

The optimizations affected:
- fadd x, +0.0 -> x
- fsub x, -0.0 -> x
- fsub +0.0, x -> fneg x
- fdiv(x, sqrt(x)) -> sqrt(x)
- frem lowering with power-of-2 divisors
@guy-david guy-david force-pushed the users/guy-david/dag-combine-sign-insensitive branch from 01e872d to 3b586e0 Compare November 17, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants