@@ -84,7 +84,7 @@ summation algorithm (as implemented in Base.sum) starts losing accuracy as soon
8484as the condition number increases, computing only noise when the condition
8585number exceeds 1/ϵ≃10¹⁶. The same goes for the naive summation algorithm.
8686In contrast, both compensated algorithms
87- (Kahan- Babuska- Neumaier and Ogita- Rump- Oishi) still accurately compute the
87+ (Kahan– Babuska– Neumaier and Ogita– Rump– Oishi) still accurately compute the
8888result at this point, and start losing accuracy there, computing meaningless
8989results when the condition nuber reaches 1/ϵ²≃10³². In effect these (simply)
9090compensated algorithms produce the same results as if a naive summation had been
@@ -151,8 +151,8 @@ thousands of elements), the implementation is memory bound (as expected of a
151151typical BLAS1 operation). Which is why we see significant decreases in the
152152performance when the vector can’t fit into the L1, L2 or L3 cache.
153153
154- On this AVX512-enabled system, the Kahan- Babuska- Neumaier implementation tends
155- to be a little more efficient than the Ogita- Rump- Oishi algorithm (this would
154+ On this AVX512-enabled system, the Kahan– Babuska– Neumaier implementation tends
155+ to be a little more efficient than the Ogita– Rump– Oishi algorithm (this would
156156generally the opposite for AVX2 systems). When implemented with a suitable
157157unrolling level and cache prefetching, these implementations are CPU-bound when
158158vectors fit inside the L1 or L2 cache. However, when vectors are too large to
0 commit comments