All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero
@ 2016-08-29 18:46 Richard Henderson
  2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 1/9] cutils: Move buffer_is_zero and subroutines to a new file Richard Henderson
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Richard Henderson @ 2016-08-29 18:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, vijay.kilari

Changes from v2 to v3:

  * Unit testing.  This includes having x86 attempt all versions of
    the accelerator that will run on the hardware.  Thus an avx2 host
    will run the basic test 5 times (1.5sec on my laptop).

  * Drop the ppc and aarch64 specializations.  I have improved the
    basic integer version to the point that those vectorized versions
    are not a win.

    In the case of my aarch64 mustang, the integer version is 4 times
    faster than the neon version that I delete.  With effort I was
    able to rewrite the neon version to come to within a factor of 1.1,
    but it remained slower than the integer.  To be fair, gcc6 makes
    very good use of ldp, so the integer path is *also* loading 16 bytes
    per insn.

    I can forward my standalone aarch64 benchmark if anyone is interested.

    Note however that at least the avx2 acceleration is still very much
    a win, being about 3 times faster on my laptop.  Of course, it's
    handling 4 times as much data per loop as the integer version, so
    one can still see the overhead caused by using vector insns.

    For grins I wrote an avx512 version, if someone has a skylake upon
    which to test and benchmark.  That requires additional configure
    checks, so I didn't bother to include it here.


r~


Richard Henderson (9):
  cutils: Move buffer_is_zero and subroutines to a new file
  cutils: Remove SPLAT macro
  cutils: Export only buffer_is_zero
  cutils: Rearrange buffer_is_zero acceleration
  cutils: Add test for buffer_is_zero
  cutils: Add generic prefetch
  cutils: Rewrite x86 buffer zero checking
  cutils: Remove aarch64 buffer zero checking
  cutils: Remove ppc buffer zero checking

 configure                 |  21 +--
 include/qemu/cutils.h     |   3 +-
 migration/ram.c           |   2 +-
 migration/rdma.c          |   5 +-
 tests/Makefile.include    |   3 +
 tests/test-bufferiszero.c |  78 +++++++++++
 util/Makefile.objs        |   1 +
 util/bufferiszero.c       | 332 ++++++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c             | 244 ----------------------------------
 9 files changed, 423 insertions(+), 266 deletions(-)
 create mode 100644 tests/test-bufferiszero.c
 create mode 100644 util/bufferiszero.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-09-13 15:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-29 18:46 [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 1/9] cutils: Move buffer_is_zero and subroutines to a new file Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 3/9] cutils: Export only buffer_is_zero Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 4/9] cutils: Rearrange buffer_is_zero acceleration Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 5/9] cutils: Add test for buffer_is_zero Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 6/9] cutils: Add generic prefetch Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 7/9] cutils: Rewrite x86 buffer zero checking Richard Henderson
2016-09-13 13:26   ` Paolo Bonzini
2016-09-13 14:17     ` Paolo Bonzini
2016-09-13 14:49       ` Paolo Bonzini
2016-09-13 15:47         ` Paolo Bonzini
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 8/9] cutils: Remove aarch64 " Richard Henderson
2016-08-29 18:46 ` [Qemu-devel] [PATCH v3 9/9] cutils: Remove ppc " Richard Henderson
2016-08-30 11:48 ` [Qemu-devel] [PATCH v3 0/9] Improve buffer_is_zero Paolo Bonzini
2016-09-05 15:08 ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.