Skip to content

Commit 1e5bb41

Browse files
rscgopherbot
authored andcommitted
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u on 32-bit systems, with the effect that constant division of both int64s and uint64s can now be emitted directly in all cases, and also that bits.Mul64 can be intrinsified on 32-bit systems. Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were implemented as uint32 divisions by c and some fixup. After expanding those smaller constant divisions, the code for i/999 required: (386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes) (arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes) (mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes) For that much code, we might as well use a full 64x64->128 multiply that can be used for all divisors, not just small ones. Having done that, the same i/999 now generates: (386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes) (arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes) (mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes) The size increase on 386 is due to a few extra register spills. The size increase on mips is due to add-with-carry being hard. The new approach is more general, letting us delete the old special case and guarantee that all int64 and uint64 divisions by constants are generated directly on 32-bit systems. This especially speeds up code making heavy use of bits.Mul64 with a constant argument, which happens in strconv and various crypto packages. A few examples are benchmarked below. pkg: cmd/compile/internal/test benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386 vs base vs base vs base vs base vs base DivconstI64 ~ ~ ~ -49.66% -21.02% ModconstI64 ~ ~ ~ -13.45% +14.52% DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32% DivisibleconstI64 ~ ~ ~ -20.01% -48.28% DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74% DivconstU64/3 ~ ~ ~ -13.82% -4.09% DivconstU64/5 ~ ~ ~ -14.10% -3.54% DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55% DivconstU64/1234567 ~ ~ ~ -61.55% -56.93% ModconstU64 ~ ~ ~ -6.25% ~ DivisibleconstU64 ~ ~ ~ -2.78% -7.82% DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56% pkg: math/bits benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Add ~ ~ ~ ~ Add32 +1.59% ~ ~ ~ Add64 ~ ~ ~ ~ Add64multiple ~ ~ ~ ~ Sub ~ ~ ~ ~ Sub32 ~ ~ ~ ~ Sub64 ~ ~ -9.20% ~ Sub64multiple ~ ~ ~ ~ Mul ~ ~ ~ ~ Mul32 ~ ~ ~ ~ Mul64 ~ ~ -41.58% -53.21% Div ~ ~ ~ ~ Div32 ~ ~ ~ ~ Div64 ~ ~ ~ ~ pkg: strconv benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base ParseInt/Pos/7bit ~ ~ -11.08% -6.75% ParseInt/Pos/26bit ~ ~ -13.65% -11.02% ParseInt/Pos/31bit ~ ~ -14.65% -9.71% ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78% ParseInt/Pos/63bit ~ ~ -13.85% -9.63% ParseInt/Neg/7bit ~ ~ -12.14% -7.26% ParseInt/Neg/26bit ~ ~ -14.18% -9.81% ParseInt/Neg/31bit ~ ~ -14.51% -9.02% ParseInt/Neg/56bit ~ ~ -15.79% -9.79% ParseInt/Neg/63bit ~ ~ -15.68% -11.07% AppendFloat/Decimal ~ ~ -7.25% -12.26% AppendFloat/Float ~ ~ -15.96% -19.45% AppendFloat/Exp ~ ~ -13.96% -17.76% AppendFloat/NegExp ~ ~ -14.89% -20.27% AppendFloat/LongExp ~ ~ -12.68% -17.97% AppendFloat/Big ~ ~ -11.10% -16.64% AppendFloat/BinaryExp ~ ~ ~ ~ AppendFloat/32Integer ~ ~ -10.05% -10.91% AppendFloat/32ExactFraction ~ ~ -8.93% -13.00% AppendFloat/32Point ~ ~ -10.36% -14.89% AppendFloat/32Exp ~ ~ -9.88% -13.54% AppendFloat/32NegExp ~ ~ -10.16% -14.26% AppendFloat/32Shortest ~ ~ -11.39% -14.96% AppendFloat/32Fixed8Hard ~ ~ ~ -2.31% AppendFloat/32Fixed9Hard ~ ~ ~ -7.01% AppendFloat/64Fixed1 ~ ~ -2.83% -8.23% AppendFloat/64Fixed2 ~ ~ ~ -7.94% AppendFloat/64Fixed3 ~ ~ -4.07% -7.22% AppendFloat/64Fixed4 ~ ~ -7.24% -7.62% AppendFloat/64Fixed12 ~ ~ -6.57% -4.82% AppendFloat/64Fixed16 ~ ~ -4.00% -5.81% AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35% AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79% AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~ AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21% AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41% pkg: crypto/internal/fips140/nistec/fiat benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base Mul/P224 ~ ~ -29.95% -39.60% Mul/P384 ~ ~ -37.11% -63.33% Mul/P521 ~ ~ -26.62% -12.42% Square/P224 +1.46% ~ -40.62% -49.18% Square/P384 ~ ~ -45.51% -69.68% Square/P521 +90.37% ~ -25.26% -11.23% (The +90% is a separate problem and not real; that much variation can be seen on that system by running the same binary from two different files.) pkg: crypto/internal/fips140/edwards25519 benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386 vs base vs base vs base vs base EncodingDecoding ~ ~ -34.67% -35.75% ScalarBaseMult ~ ~ -31.25% -30.29% ScalarMult ~ ~ -33.45% -32.54% VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68% Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65 Reviewed-on: https://go-review.googlesource.com/c/go/+/716061 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
1 parent 38317c4 commit 1e5bb41

File tree

23 files changed

+664
-343
lines changed

23 files changed

+664
-343
lines changed

src/cmd/compile/internal/arm/ssa.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
245245
p.To.Type = obj.TYPE_REG
246246
p.To.Reg = r
247247
case ssa.OpARMADDS,
248+
ssa.OpARMADCS,
248249
ssa.OpARMSUBS:
249250
r := v.Reg0()
250251
r1 := v.Args[0].Reg()

src/cmd/compile/internal/ssa/_gen/386.rules

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
(Add(32|64)F ...) => (ADDS(S|D) ...)
88
(Add32carry ...) => (ADDLcarry ...)
99
(Add32withcarry ...) => (ADCL ...)
10+
(Add32carrywithcarry ...) => (ADCLcarry ...)
1011

1112
(Sub(Ptr|32|16|8) ...) => (SUBL ...)
1213
(Sub(32|64)F ...) => (SUBS(S|D) ...)

src/cmd/compile/internal/ssa/_gen/386Ops.go

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -90,22 +90,23 @@ func init() {
9090

9191
// Common regInfo
9292
var (
93-
gp01 = regInfo{inputs: nil, outputs: gponly}
94-
gp11 = regInfo{inputs: []regMask{gp}, outputs: gponly}
95-
gp11sp = regInfo{inputs: []regMask{gpsp}, outputs: gponly}
96-
gp11sb = regInfo{inputs: []regMask{gpspsb}, outputs: gponly}
97-
gp21 = regInfo{inputs: []regMask{gp, gp}, outputs: gponly}
98-
gp11carry = regInfo{inputs: []regMask{gp}, outputs: []regMask{gp, 0}}
99-
gp21carry = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp, 0}}
100-
gp1carry1 = regInfo{inputs: []regMask{gp}, outputs: gponly}
101-
gp2carry1 = regInfo{inputs: []regMask{gp, gp}, outputs: gponly}
102-
gp21sp = regInfo{inputs: []regMask{gpsp, gp}, outputs: gponly}
103-
gp21sb = regInfo{inputs: []regMask{gpspsb, gpsp}, outputs: gponly}
104-
gp21shift = regInfo{inputs: []regMask{gp, cx}, outputs: []regMask{gp}}
105-
gp11div = regInfo{inputs: []regMask{ax, gpsp &^ dx}, outputs: []regMask{ax}, clobbers: dx}
106-
gp21hmul = regInfo{inputs: []regMask{ax, gpsp}, outputs: []regMask{dx}, clobbers: ax}
107-
gp11mod = regInfo{inputs: []regMask{ax, gpsp &^ dx}, outputs: []regMask{dx}, clobbers: ax}
108-
gp21mul = regInfo{inputs: []regMask{ax, gpsp}, outputs: []regMask{dx, ax}}
93+
gp01 = regInfo{inputs: nil, outputs: gponly}
94+
gp11 = regInfo{inputs: []regMask{gp}, outputs: gponly}
95+
gp11sp = regInfo{inputs: []regMask{gpsp}, outputs: gponly}
96+
gp11sb = regInfo{inputs: []regMask{gpspsb}, outputs: gponly}
97+
gp21 = regInfo{inputs: []regMask{gp, gp}, outputs: gponly}
98+
gp11carry = regInfo{inputs: []regMask{gp}, outputs: []regMask{gp, 0}}
99+
gp21carry = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp, 0}}
100+
gp1carry1 = regInfo{inputs: []regMask{gp}, outputs: gponly}
101+
gp2carry1 = regInfo{inputs: []regMask{gp, gp}, outputs: gponly}
102+
gp2carry1carry = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp, 0}}
103+
gp21sp = regInfo{inputs: []regMask{gpsp, gp}, outputs: gponly}
104+
gp21sb = regInfo{inputs: []regMask{gpspsb, gpsp}, outputs: gponly}
105+
gp21shift = regInfo{inputs: []regMask{gp, cx}, outputs: []regMask{gp}}
106+
gp11div = regInfo{inputs: []regMask{ax, gpsp &^ dx}, outputs: []regMask{ax}, clobbers: dx}
107+
gp21hmul = regInfo{inputs: []regMask{ax, gpsp}, outputs: []regMask{dx}, clobbers: ax}
108+
gp11mod = regInfo{inputs: []regMask{ax, gpsp &^ dx}, outputs: []regMask{dx}, clobbers: ax}
109+
gp21mul = regInfo{inputs: []regMask{ax, gpsp}, outputs: []regMask{dx, ax}}
109110

110111
gp2flags = regInfo{inputs: []regMask{gpsp, gpsp}}
111112
gp1flags = regInfo{inputs: []regMask{gpsp}}
@@ -181,10 +182,11 @@ func init() {
181182
{name: "ADDL", argLength: 2, reg: gp21sp, asm: "ADDL", commutative: true, clobberFlags: true}, // arg0 + arg1
182183
{name: "ADDLconst", argLength: 1, reg: gp11sp, asm: "ADDL", aux: "Int32", typ: "UInt32", clobberFlags: true}, // arg0 + auxint
183184

184-
{name: "ADDLcarry", argLength: 2, reg: gp21carry, asm: "ADDL", commutative: true, resultInArg0: true}, // arg0 + arg1, generates <carry,result> pair
185-
{name: "ADDLconstcarry", argLength: 1, reg: gp11carry, asm: "ADDL", aux: "Int32", resultInArg0: true}, // arg0 + auxint, generates <carry,result> pair
186-
{name: "ADCL", argLength: 3, reg: gp2carry1, asm: "ADCL", commutative: true, resultInArg0: true, clobberFlags: true}, // arg0+arg1+carry(arg2), where arg2 is flags
187-
{name: "ADCLconst", argLength: 2, reg: gp1carry1, asm: "ADCL", aux: "Int32", resultInArg0: true, clobberFlags: true}, // arg0+auxint+carry(arg1), where arg1 is flags
185+
{name: "ADDLcarry", argLength: 2, reg: gp21carry, asm: "ADDL", commutative: true, resultInArg0: true}, // arg0 + arg1, generates <carry,result> pair
186+
{name: "ADDLconstcarry", argLength: 1, reg: gp11carry, asm: "ADDL", aux: "Int32", resultInArg0: true}, // arg0 + auxint, generates <carry,result> pair
187+
{name: "ADCL", argLength: 3, reg: gp2carry1, asm: "ADCL", commutative: true, resultInArg0: true, clobberFlags: true}, // arg0+arg1+carry(arg2), where arg2 is flags
188+
{name: "ADCLcarry", argLength: 3, reg: gp2carry1carry, asm: "ADCL", commutative: true, resultInArg0: true, clobberFlags: true}, // arg0+arg1+carry(arg2), where arg2 is flags, generates <carry,result> pair
189+
{name: "ADCLconst", argLength: 2, reg: gp1carry1, asm: "ADCL", aux: "Int32", resultInArg0: true, clobberFlags: true}, // arg0+auxint+carry(arg1), where arg1 is flags
188190

189191
{name: "SUBL", argLength: 2, reg: gp21, asm: "SUBL", resultInArg0: true, clobberFlags: true}, // arg0 - arg1
190192
{name: "SUBLconst", argLength: 1, reg: gp11, asm: "SUBL", aux: "Int32", resultInArg0: true, clobberFlags: true}, // arg0 - auxint

src/cmd/compile/internal/ssa/_gen/ARM.rules

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
(Add(32|64)F ...) => (ADD(F|D) ...)
77
(Add32carry ...) => (ADDS ...)
88
(Add32withcarry ...) => (ADC ...)
9+
(Add32carrywithcarry ...) => (ADCS ...)
910

1011
(Sub(Ptr|32|16|8) ...) => (SUB ...)
1112
(Sub(32|64)F ...) => (SUB(F|D) ...)

src/cmd/compile/internal/ssa/_gen/ARMOps.go

Lines changed: 42 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -102,36 +102,37 @@ func init() {
102102
)
103103
// Common regInfo
104104
var (
105-
gp01 = regInfo{inputs: nil, outputs: []regMask{gp}}
106-
gp11 = regInfo{inputs: []regMask{gpg}, outputs: []regMask{gp}}
107-
gp11carry = regInfo{inputs: []regMask{gpg}, outputs: []regMask{gp, 0}}
108-
gp11sp = regInfo{inputs: []regMask{gpspg}, outputs: []regMask{gp}}
109-
gp1flags = regInfo{inputs: []regMask{gpg}}
110-
gp1flags1 = regInfo{inputs: []regMask{gp}, outputs: []regMask{gp}}
111-
gp21 = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp}}
112-
gp21carry = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp, 0}}
113-
gp2flags = regInfo{inputs: []regMask{gpg, gpg}}
114-
gp2flags1 = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp}}
115-
gp22 = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp, gp}}
116-
gp31 = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp}}
117-
gp31carry = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp, 0}}
118-
gp3flags = regInfo{inputs: []regMask{gp, gp, gp}}
119-
gp3flags1 = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp}}
120-
gpload = regInfo{inputs: []regMask{gpspsbg}, outputs: []regMask{gp}}
121-
gpstore = regInfo{inputs: []regMask{gpspsbg, gpg}}
122-
gp2load = regInfo{inputs: []regMask{gpspsbg, gpg}, outputs: []regMask{gp}}
123-
gp2store = regInfo{inputs: []regMask{gpspsbg, gpg, gpg}}
124-
fp01 = regInfo{inputs: nil, outputs: []regMask{fp}}
125-
fp11 = regInfo{inputs: []regMask{fp}, outputs: []regMask{fp}}
126-
fp1flags = regInfo{inputs: []regMask{fp}}
127-
fpgp = regInfo{inputs: []regMask{fp}, outputs: []regMask{gp}, clobbers: buildReg("F15")} // int-float conversion uses F15 as tmp
128-
gpfp = regInfo{inputs: []regMask{gp}, outputs: []regMask{fp}, clobbers: buildReg("F15")}
129-
fp21 = regInfo{inputs: []regMask{fp, fp}, outputs: []regMask{fp}}
130-
fp31 = regInfo{inputs: []regMask{fp, fp, fp}, outputs: []regMask{fp}}
131-
fp2flags = regInfo{inputs: []regMask{fp, fp}}
132-
fpload = regInfo{inputs: []regMask{gpspsbg}, outputs: []regMask{fp}}
133-
fpstore = regInfo{inputs: []regMask{gpspsbg, fp}}
134-
readflags = regInfo{inputs: nil, outputs: []regMask{gp}}
105+
gp01 = regInfo{inputs: nil, outputs: []regMask{gp}}
106+
gp11 = regInfo{inputs: []regMask{gpg}, outputs: []regMask{gp}}
107+
gp11carry = regInfo{inputs: []regMask{gpg}, outputs: []regMask{gp, 0}}
108+
gp11sp = regInfo{inputs: []regMask{gpspg}, outputs: []regMask{gp}}
109+
gp1flags = regInfo{inputs: []regMask{gpg}}
110+
gp1flags1 = regInfo{inputs: []regMask{gp}, outputs: []regMask{gp}}
111+
gp21 = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp}}
112+
gp21carry = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp, 0}}
113+
gp2flags = regInfo{inputs: []regMask{gpg, gpg}}
114+
gp2flags1 = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp}}
115+
gp2flags1carry = regInfo{inputs: []regMask{gp, gp}, outputs: []regMask{gp, 0}}
116+
gp22 = regInfo{inputs: []regMask{gpg, gpg}, outputs: []regMask{gp, gp}}
117+
gp31 = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp}}
118+
gp31carry = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp, 0}}
119+
gp3flags = regInfo{inputs: []regMask{gp, gp, gp}}
120+
gp3flags1 = regInfo{inputs: []regMask{gp, gp, gp}, outputs: []regMask{gp}}
121+
gpload = regInfo{inputs: []regMask{gpspsbg}, outputs: []regMask{gp}}
122+
gpstore = regInfo{inputs: []regMask{gpspsbg, gpg}}
123+
gp2load = regInfo{inputs: []regMask{gpspsbg, gpg}, outputs: []regMask{gp}}
124+
gp2store = regInfo{inputs: []regMask{gpspsbg, gpg, gpg}}
125+
fp01 = regInfo{inputs: nil, outputs: []regMask{fp}}
126+
fp11 = regInfo{inputs: []regMask{fp}, outputs: []regMask{fp}}
127+
fp1flags = regInfo{inputs: []regMask{fp}}
128+
fpgp = regInfo{inputs: []regMask{fp}, outputs: []regMask{gp}, clobbers: buildReg("F15")} // int-float conversion uses F15 as tmp
129+
gpfp = regInfo{inputs: []regMask{gp}, outputs: []regMask{fp}, clobbers: buildReg("F15")}
130+
fp21 = regInfo{inputs: []regMask{fp, fp}, outputs: []regMask{fp}}
131+
fp31 = regInfo{inputs: []regMask{fp, fp, fp}, outputs: []regMask{fp}}
132+
fp2flags = regInfo{inputs: []regMask{fp, fp}}
133+
fpload = regInfo{inputs: []regMask{gpspsbg}, outputs: []regMask{fp}}
134+
fpstore = regInfo{inputs: []regMask{gpspsbg, fp}}
135+
readflags = regInfo{inputs: nil, outputs: []regMask{gp}}
135136
)
136137
ops := []opData{
137138
// binary ops
@@ -161,16 +162,17 @@ func init() {
161162
call: false, // TODO(mdempsky): Should this be true?
162163
},
163164

164-
{name: "ADDS", argLength: 2, reg: gp21carry, asm: "ADD", commutative: true}, // arg0 + arg1, set carry flag
165-
{name: "ADDSconst", argLength: 1, reg: gp11carry, asm: "ADD", aux: "Int32"}, // arg0 + auxInt, set carry flag
166-
{name: "ADC", argLength: 3, reg: gp2flags1, asm: "ADC", commutative: true}, // arg0 + arg1 + carry, arg2=flags
167-
{name: "ADCconst", argLength: 2, reg: gp1flags1, asm: "ADC", aux: "Int32"}, // arg0 + auxInt + carry, arg1=flags
168-
{name: "SUBS", argLength: 2, reg: gp21carry, asm: "SUB"}, // arg0 - arg1, set carry flag
169-
{name: "SUBSconst", argLength: 1, reg: gp11carry, asm: "SUB", aux: "Int32"}, // arg0 - auxInt, set carry flag
170-
{name: "RSBSconst", argLength: 1, reg: gp11carry, asm: "RSB", aux: "Int32"}, // auxInt - arg0, set carry flag
171-
{name: "SBC", argLength: 3, reg: gp2flags1, asm: "SBC"}, // arg0 - arg1 - carry, arg2=flags
172-
{name: "SBCconst", argLength: 2, reg: gp1flags1, asm: "SBC", aux: "Int32"}, // arg0 - auxInt - carry, arg1=flags
173-
{name: "RSCconst", argLength: 2, reg: gp1flags1, asm: "RSC", aux: "Int32"}, // auxInt - arg0 - carry, arg1=flags
165+
{name: "ADDS", argLength: 2, reg: gp21carry, asm: "ADD", commutative: true}, // arg0 + arg1, set carry flag
166+
{name: "ADDSconst", argLength: 1, reg: gp11carry, asm: "ADD", aux: "Int32"}, // arg0 + auxInt, set carry flag
167+
{name: "ADC", argLength: 3, reg: gp2flags1, asm: "ADC", commutative: true}, // arg0 + arg1 + carry, arg2=flags
168+
{name: "ADCconst", argLength: 2, reg: gp1flags1, asm: "ADC", aux: "Int32"}, // arg0 + auxInt + carry, arg1=flags
169+
{name: "ADCS", argLength: 3, reg: gp2flags1carry, asm: "ADC", commutative: true}, // arg0 + arg1 + carrry, sets carry
170+
{name: "SUBS", argLength: 2, reg: gp21carry, asm: "SUB"}, // arg0 - arg1, set carry flag
171+
{name: "SUBSconst", argLength: 1, reg: gp11carry, asm: "SUB", aux: "Int32"}, // arg0 - auxInt, set carry flag
172+
{name: "RSBSconst", argLength: 1, reg: gp11carry, asm: "RSB", aux: "Int32"}, // auxInt - arg0, set carry flag
173+
{name: "SBC", argLength: 3, reg: gp2flags1, asm: "SBC"}, // arg0 - arg1 - carry, arg2=flags
174+
{name: "SBCconst", argLength: 2, reg: gp1flags1, asm: "SBC", aux: "Int32"}, // arg0 - auxInt - carry, arg1=flags
175+
{name: "RSCconst", argLength: 2, reg: gp1flags1, asm: "RSC", aux: "Int32"}, // auxInt - arg0 - carry, arg1=flags
174176

175177
{name: "MULLU", argLength: 2, reg: gp22, asm: "MULLU", commutative: true}, // arg0 * arg1, high 32 bits in out0, low 32 bits in out1
176178
{name: "MULA", argLength: 3, reg: gp31, asm: "MULA"}, // arg0 * arg1 + arg2

src/cmd/compile/internal/ssa/_gen/MIPS.rules

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
(Select1 (Add32carry <t> x y)) => (SGTU <typ.Bool> x (ADD <t.FieldType(0)> x y))
1010
(Add32withcarry <t> x y c) => (ADD c (ADD <t> x y))
1111

12+
(Select0 (Add32carrywithcarry <t> x y c)) => (ADD <t.FieldType(0)> c (ADD <t.FieldType(0)> x y))
13+
(Select1 (Add32carrywithcarry <t> x y c)) =>
14+
(OR <typ.Bool>
15+
(SGTU <typ.Bool> x xy:(ADD <t.FieldType(0)> x y))
16+
(SGTU <typ.Bool> xy (ADD <t.FieldType(0)> c xy)))
17+
1218
(Sub(Ptr|32|16|8) ...) => (SUB ...)
1319
(Sub(32|64)F ...) => (SUB(F|D) ...)
1420

src/cmd/compile/internal/ssa/_gen/dec64.rules

Lines changed: 82 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@
66
// architectures. These rules work together with the decomposeBuiltin
77
// pass which handles phis of these typ.
88

9+
(Last ___) => v.Args[len(v.Args)-1]
10+
911
(Int64Hi (Int64Make hi _)) => hi
1012
(Int64Lo (Int64Make _ lo)) => lo
13+
(Select0 (MakeTuple x y)) => x
14+
(Select1 (MakeTuple x y)) => y
1115

1216
(Load <t> ptr mem) && is64BitInt(t) && !config.BigEndian && t.IsSigned() =>
1317
(Int64Make
@@ -60,30 +64,85 @@
6064
(Arg <typ.UInt32> {n} [off])
6165
(Arg <typ.UInt32> {n} [off+4]))
6266

63-
(Add64 x y) =>
64-
(Int64Make
65-
(Add32withcarry <typ.Int32>
66-
(Int64Hi x)
67-
(Int64Hi y)
68-
(Select1 <types.TypeFlags> (Add32carry (Int64Lo x) (Int64Lo y))))
69-
(Select0 <typ.UInt32> (Add32carry (Int64Lo x) (Int64Lo y))))
67+
(Add64 <t> x y) =>
68+
(Last <t>
69+
x0: (Int64Lo x)
70+
x1: (Int64Hi x)
71+
y0: (Int64Lo y)
72+
y1: (Int64Hi y)
73+
add: (Add32carry x0 y0)
74+
(Int64Make
75+
(Add32withcarry <typ.UInt32> x1 y1 (Select1 <types.TypeFlags> add))
76+
(Select0 <typ.UInt32> add)))
77+
78+
(Sub64 <t> x y) =>
79+
(Last <t>
80+
x0: (Int64Lo x)
81+
x1: (Int64Hi x)
82+
y0: (Int64Lo y)
83+
y1: (Int64Hi y)
84+
sub: (Sub32carry x0 y0)
85+
(Int64Make
86+
(Sub32withcarry <typ.UInt32> x1 y1 (Select1 <types.TypeFlags> sub))
87+
(Select0 <typ.UInt32> sub)))
88+
89+
(Mul64 <t> x y) =>
90+
(Last <t>
91+
x0: (Int64Lo x)
92+
x1: (Int64Hi x)
93+
y0: (Int64Lo y)
94+
y1: (Int64Hi y)
95+
x0y0: (Mul32uhilo x0 y0)
96+
x0y0Hi: (Select0 <typ.UInt32> x0y0)
97+
x0y0Lo: (Select1 <typ.UInt32> x0y0)
98+
(Int64Make
99+
(Add32 <typ.UInt32> x0y0Hi
100+
(Add32 <typ.UInt32>
101+
(Mul32 <typ.UInt32> x0 y1)
102+
(Mul32 <typ.UInt32> x1 y0)))
103+
x0y0Lo))
104+
105+
(Mul64uhilo <t> x y) =>
106+
(Last <t>
107+
x0: (Int64Lo x)
108+
x1: (Int64Hi x)
109+
y0: (Int64Lo y)
110+
y1: (Int64Hi y)
111+
x0y0: (Mul32uhilo x0 y0)
112+
x0y1: (Mul32uhilo x0 y1)
113+
x1y0: (Mul32uhilo x1 y0)
114+
x1y1: (Mul32uhilo x1 y1)
115+
x0y0Hi: (Select0 <typ.UInt32> x0y0)
116+
x0y0Lo: (Select1 <typ.UInt32> x0y0)
117+
x0y1Hi: (Select0 <typ.UInt32> x0y1)
118+
x0y1Lo: (Select1 <typ.UInt32> x0y1)
119+
x1y0Hi: (Select0 <typ.UInt32> x1y0)
120+
x1y0Lo: (Select1 <typ.UInt32> x1y0)
121+
x1y1Hi: (Select0 <typ.UInt32> x1y1)
122+
x1y1Lo: (Select1 <typ.UInt32> x1y1)
123+
w1a: (Add32carry x0y0Hi x0y1Lo)
124+
w2a: (Add32carrywithcarry x0y1Hi x1y0Hi (Select1 <types.TypeFlags> w1a))
125+
w3a: (Add32withcarry <typ.UInt32> x1y1Hi (Const32 <typ.UInt32> [0]) (Select1 <types.TypeFlags> w2a))
126+
w1b: (Add32carry x1y0Lo (Select0 <typ.UInt32> w1a))
127+
w2b: (Add32carrywithcarry x1y1Lo (Select0 <typ.UInt32> w2a) (Select1 <types.TypeFlags> w1b))
128+
w3b: (Add32withcarry <typ.UInt32> w3a (Const32 <typ.UInt32> [0]) (Select1 <types.TypeFlags> w2b))
129+
(MakeTuple <types.NewTuple(typ.UInt64,typ.UInt64)>
130+
(Int64Make w3b (Select0 <typ.UInt32> w2b))
131+
(Int64Make (Select0 <typ.UInt32> w1b) x0y0Lo)))
132+
133+
(Hmul64u x y) => (Select0 (Mul64uhilo x y))
134+
135+
// Hacker's Delight p. 175: signed hmul = unsigned hmul - (x<0)&y - (y<0)&x.
136+
(Hmul64 x y) =>
137+
(Last
138+
p: (Hmul64u <typ.UInt64> x y)
139+
xSign: (Int64Make xs:(Rsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [31])) xs)
140+
ySign: (Int64Make ys:(Rsh32x32 <typ.UInt32> (Int64Hi y) (Const32 <typ.UInt32> [31])) ys)
141+
(Sub64 <typ.Int64> (Sub64 <typ.Int64> p (And64 <typ.Int64> xSign y)) (And64 <typ.Int64> ySign x)))
142+
143+
// (x+y)/2 => (x-y)/2 + y
144+
(Avg64u <t> x y) => (Add64 (Rsh64Ux32 <t> (Sub64 <t> x y) (Const32 <typ.UInt32> [1])) y)
70145

71-
(Sub64 x y) =>
72-
(Int64Make
73-
(Sub32withcarry <typ.Int32>
74-
(Int64Hi x)
75-
(Int64Hi y)
76-
(Select1 <types.TypeFlags> (Sub32carry (Int64Lo x) (Int64Lo y))))
77-
(Select0 <typ.UInt32> (Sub32carry (Int64Lo x) (Int64Lo y))))
78-
79-
(Mul64 x y) =>
80-
(Int64Make
81-
(Add32 <typ.UInt32>
82-
(Mul32 <typ.UInt32> (Int64Lo x) (Int64Hi y))
83-
(Add32 <typ.UInt32>
84-
(Mul32 <typ.UInt32> (Int64Hi x) (Int64Lo y))
85-
(Select0 <typ.UInt32> (Mul32uhilo (Int64Lo x) (Int64Lo y)))))
86-
(Select1 <typ.UInt32> (Mul32uhilo (Int64Lo x) (Int64Lo y))))
87146

88147
(And64 x y) =>
89148
(Int64Make

0 commit comments

Comments
 (0)