-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
define <16 x i16> @concat_uniform_shift_v16i16_v8i16(<8 x i16> %a0, <8 x i16> %a1) {
%v0 = shl <8 x i16> %a0, splat (i16 7)
%v1 = shl <8 x i16> %a1, splat (i16 7)
%res = shufflevector <8 x i16> %v0, <8 x i16> %v1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i16> %res
}
-passes=vector-combine -mtriple=x86_64-- -mcpu=znver2
define <16 x i16> @concat_uniform_shift_v16i16_v8i16(<8 x i16> %a0, <8 x i16> %a1) #0 {
%v0 = shl <8 x i16> %a0, splat (i16 7)
%v1 = shl <8 x i16> %a1, splat (i16 7)
%res = shufflevector <8 x i16> %v0, <8 x i16> %v1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i16> %res
}
when it should be:
define <16 x i16> @concat_uniform_shift_v16i16_v8i16(<8 x i16> %a0, <8 x i16> %a1) #0 {
%v = shufflevector <8 x i16> %a0, <8 x i16> %a1, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%res = shl <16 x i16> %v, splat (i16 7)
ret <16 x i16> %res
}
Vectorcombine fails to concatenate these shifts as the "NewCost" getArithmeticInstrCost calls fail to include any TTI::OperandValueInfo data - both of the shift amounts are OK_UniformConstantValue, so at the very least we should merge these into OK_NonUniformConstantValue.
A possible starting point would be to add TTI::getOperandInfo calls in the binop NewCost calculation for both LHS/RHS ops - and create some kind of
OperandValueInfo::mergeInfo(X, Y) static helper that merge common info data (all constant, all powerof2 etc - although all uniform can't be done this way!).
But for this particular case, the RHS operands are identical - so we don't need to merge, we can just use the original OperandValueInfo.
foldPermuteOfBinops has a similar problem but it looks like OldCost and NewCost both use getArithmeticInstrCost - which is weird as I'd expect OldCost to use getInstructionCost at least.