On Thu, Jul 22, 2021 at 3:14 PM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Richard Henderson (richard.henderson@linaro.org) wrote:
> On 7/22/21 12:02 AM, Dr. David Alan Gilbert wrote:
> > Hi Richard,
> >    I think you were the last person to fiddle with the prefetching
> > in buffer_zero_avx2 and friends; Joe (cc'd) wondered if explicit
> > prefetching still made sense on modern CPUs, and that their hardware
> > generally figures stuff out better on simple increments.
> >
> >    What was your thinking on this, and did you actually measure
> > any improvement?
>
> Ah, well, that was 5 years ago so I have no particular memory of this.  It
> wouldn't surprise me if you can't measure any improvement on modern
> hardware.
>
> Do you now measure an improvement with the prefetches gone?

Not tried, it just came from Joe's suggestion that it was generally a
bad idea these days; I do remember that the behaviour of those functions
is quite tricky because there performance is VERY data dependent - many
VMs actually have pages that are quite dirty so you never iterate the
loop, but then you hit others with big zero pages and you spend your
entire life in the loop.


Dave, Richard:
My curiosity got the best of me.  So I created a small test program that used the buffer_zero_avx2() routine from qemu's bufferiszero.c.

When I run it on an Intel Cascade Lake processor, the cost of calling "__builtin_prefetch(p)" is in the noise range .  It's always "just slightly" slower.  I doubt it could ever be measured in qemu.

Ironically, when I disabled the hardware prefetchers, the program slowed down over 33%.  And the call to "__builtin_prefetch(p)" actually hurt performance by over 3%.

My results are below, (only with the hardware prefetchers enabled).  The program is attached.
Joe

# gcc -mavx buffer_zero_avx.c -O -DDO_PREFETCH ; for i in {1..5}; do ./a.out; done
TSC 356144 Kcycles.
TSC 356714 Kcycles.
TSC 356707 Kcycles.
TSC 356565 Kcycles.
TSC 356853 Kcycles.
# gcc -mavx buffer_zero_avx.c -O ; for i in {1..5}; do ./a.out; done
TSC 355520 Kcycles.
TSC 355961 Kcycles.
TSC 355872 Kcycles.
TSC 355948 Kcycles.
TSC 355918 Kcycles.

Dave
>
> r~
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK