qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512()
@ 2020-02-13  7:52 Robert Hoo
  2020-02-13  7:52 ` [PATCH 1/2] configure: add configure option avx512f_opt Robert Hoo
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Robert Hoo @ 2020-02-13  7:52 UTC (permalink / raw)
  To: qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu, Robert Hoo

1. Add avx512_opt option and enable it when host has the ability

2. Implement new buffer_zero_avx512() with AVX512F instructions

Robert Hoo (2):
  configure: add configure option avx512f_opt
  util: add function buffer_zero_avx512()

 configure            | 39 ++++++++++++++++++++++++++++++++++++
 include/qemu/cpuid.h |  3 +++
 util/bufferiszero.c  | 56 +++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 95 insertions(+), 3 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] configure: add configure option avx512f_opt
  2020-02-13  7:52 [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() Robert Hoo
@ 2020-02-13  7:52 ` Robert Hoo
  2020-02-13  7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
  2020-02-13  8:40 ` [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() no-reply
  2 siblings, 0 replies; 11+ messages in thread
From: Robert Hoo @ 2020-02-13  7:52 UTC (permalink / raw)
  To: qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu, Robert Hoo

Like previous avx2_opt option, config-host.mak will have CONFIG_AVX512F_OPT
defined if compiling host has the ability.

AVX512F instruction set is available since Intel Skylake.
More info:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 configure | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/configure b/configure
index 115dc38..9bf8de0 100755
--- a/configure
+++ b/configure
@@ -1382,6 +1382,11 @@ for opt do
   ;;
   --enable-avx2) avx2_opt="yes"
   ;;
+  --disable-avx512f) avx512f_opt="no"
+  ;;
+  --enable-avx512f) avx512f_opt="yes"
+  ;;
+
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1811,6 +1816,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   tcmalloc        tcmalloc support
   jemalloc        jemalloc support
   avx2            AVX2 optimization support
+  avx512f         AVX512F optimization support
   replication     replication support
   opengl          opengl support
   virglrenderer   virgl rendering support
@@ -5481,6 +5487,34 @@ EOF
   fi
 fi
 
+##########################################
+# avx512f optimization requirement check
+#
+# There is no point enabling this if cpuid.h is not usable,
+# since we won't be able to select the new routines.
+
+if test "$cpuid_h" = "yes" && test "$avx512f_opt" != "no"; then
+  cat > $TMPC << EOF
+#pragma GCC push_options
+#pragma GCC target("avx512f")
+#include <cpuid.h>
+#include <immintrin.h>
+static int bar(void *a) {
+    __m512i x = *(__m512i *)a;
+    return _mm512_test_epi64_mask(x, x);
+}
+int main(int argc, char *argv[])
+{
+	return bar(argv[0]);
+}
+EOF
+  if compile_object "" ; then
+    avx512f_opt="yes"
+  else
+    avx512f_opt="no"
+  fi
+fi
+
 ########################################
 # check if __[u]int128_t is usable.
 
@@ -6605,6 +6639,7 @@ echo "libxml2           $libxml2"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
 echo "avx2 optimization $avx2_opt"
+echo "avx512f optimization $avx512f_opt"
 echo "replication support $replication"
 echo "VxHS block device $vxhs"
 echo "bochs support     $bochs"
@@ -7152,6 +7187,10 @@ if test "$avx2_opt" = "yes" ; then
   echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
 fi
 
+if test "$avx512f_opt" = "yes" ; then
+  echo "CONFIG_AVX512F_OPT=y" >> $config_host_mak
+fi
+
 if test "$lzo" = "yes" ; then
   echo "CONFIG_LZO=y" >> $config_host_mak
 fi
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-13  7:52 [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() Robert Hoo
  2020-02-13  7:52 ` [PATCH 1/2] configure: add configure option avx512f_opt Robert Hoo
@ 2020-02-13  7:52 ` Robert Hoo
  2020-02-13 10:30   ` Paolo Bonzini
  2020-02-13 18:20   ` Richard Henderson
  2020-02-13  8:40 ` [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() no-reply
  2 siblings, 2 replies; 11+ messages in thread
From: Robert Hoo @ 2020-02-13  7:52 UTC (permalink / raw)
  To: qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu, Robert Hoo

And initialize buffer_is_zero() with it, when Intel AVX512F is
available on host.

This function utilizes Intel AVX512 fundamental instructions which
perform over previous AVX2 instructions.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 include/qemu/cpuid.h |  3 +++
 util/bufferiszero.c  | 56 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 6930170..09fc245 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -45,6 +45,9 @@
 #ifndef bit_AVX2
 #define bit_AVX2        (1 << 5)
 #endif
+#ifndef bit_AVX512F
+#define bit_AVX512F        (1 << 16)
+#endif
 #ifndef bit_BMI2
 #define bit_BMI2        (1 << 8)
 #endif
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index bfb2605..cbb854a 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -187,12 +187,54 @@ buffer_zero_avx2(const void *buf, size_t len)
 #pragma GCC pop_options
 #endif /* CONFIG_AVX2_OPT */
 
+#ifdef CONFIG_AVX512F_OPT
+#pragma GCC push_options
+#pragma GCC target("avx512f")
+#include <immintrin.h>
+
+static bool
+buffer_zero_avx512(const void *buf, size_t len)
+{
+    __m512i t;
+    __m512i *p, *e;
+
+    if (unlikely(len < 64)) { /*buff less than 512 bits, unlikely*/
+        return buffer_zero_int(buf, len);
+    }
+    /* Begin with an unaligned head of 64 bytes.  */
+    t = _mm512_loadu_si512(buf);
+    p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
+    e = (__m512i *)(((uintptr_t)buf + len) & -64);
+
+    /* Loop over 64-byte aligned blocks of 256.  */
+    while (p < e) {
+        __builtin_prefetch(p);
+        if (unlikely(_mm512_test_epi64_mask(t, t))) {
+            return false;
+        }
+        t = p[-4] | p[-3] | p[-2] | p[-1];
+        p += 4;
+    }
+
+    t |= _mm512_loadu_si512(buf + len - 4 * 64);
+    t |= _mm512_loadu_si512(buf + len - 3 * 64);
+    t |= _mm512_loadu_si512(buf + len - 2 * 64);
+    t |= _mm512_loadu_si512(buf + len - 1 * 64);
+
+    return !_mm512_test_epi64_mask(t, t);
+
+}
+#pragma GCC pop_options
+#endif
+
+
 /* Note that for test_buffer_is_zero_next_accel, the most preferred
  * ISA must have the least significant bit.
  */
-#define CACHE_AVX2    1
-#define CACHE_SSE4    2
-#define CACHE_SSE2    4
+#define CACHE_AVX512F 1
+#define CACHE_AVX2    2
+#define CACHE_SSE4    4
+#define CACHE_SSE2    6
 
 /* Make sure that these variables are appropriately initialized when
  * SSE2 is enabled on the compiler command-line, but the compiler is
@@ -226,6 +268,11 @@ static void init_accel(unsigned cache)
         fn = buffer_zero_avx2;
     }
 #endif
+#ifdef CONFIG_AVX512F_OPT
+    if (cache & CACHE_AVX512F) {
+        fn = buffer_zero_avx512;
+    }
+#endif
     buffer_accel = fn;
 }
 
@@ -255,6 +302,9 @@ static void __attribute__((constructor)) init_cpuid_cache(void)
             if ((bv & 6) == 6 && (b & bit_AVX2)) {
                 cache |= CACHE_AVX2;
             }
+            if ((bv & 6) == 6 && (b & bit_AVX512F)) {
+                cache |= CACHE_AVX512F;
+            }
         }
     }
     cpuid_cache = cache;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512()
  2020-02-13  7:52 [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() Robert Hoo
  2020-02-13  7:52 ` [PATCH 1/2] configure: add configure option avx512f_opt Robert Hoo
  2020-02-13  7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
@ 2020-02-13  8:40 ` no-reply
  2 siblings, 0 replies; 11+ messages in thread
From: no-reply @ 2020-02-13  8:40 UTC (permalink / raw)
  To: robert.hu
  Cc: berrange, qemu-devel, laurent, robert.hu, pbonzini, robert.hu, philmd

Patchew URL: https://patchew.org/QEMU/1581580379-54109-1-git-send-email-robert.hu@linux.intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

qemu-system-aarch64: check_section_footer: Read section footer failed: -5
qemu-system-aarch64: load of migration failed: Invalid argument
/tmp/qemu-test/src/tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
ERROR - too few tests run (expected 15, got 2)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    check-unit: tests/test-bitmap
  TEST    check-unit: tests/test-aio
---
qemu-system-x86_64: check_section_footer: Read section footer failed: -5
qemu-system-x86_64: load of migration failed: Invalid argument
/tmp/qemu-test/src/tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
ERROR - too few tests run (expected 74, got 62)
make: *** [check-qtest-x86_64] Error 1
  TEST    check-unit: tests/test-bufferiszero
ERROR - too few tests run (expected 1, got 0)
make: *** [check-unit] Error 1
  TEST    iotest-qcow2: 030 [fail]
QEMU          -- "/tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64" -nodefaults -display none -accel qtest
QEMU_IMG      -- "/tmp/qemu-test/build/tests/qemu-iotests/../../qemu-img" 
---
+++ /tmp/qemu-test/build/tests/qemu-iotests/030.out.bad 2020-02-13 08:33:05.905912850 +0000
@@ -1,5 +1,335 @@
-...........................
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-name=node6,backing.node-name=node7,node-name=node8
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,driver=quorum,vote-threshold=2,children.0.file.filename=/tmp/qemu-test/img-0.img,children.0.node-name=node0,children.1.file.filename=/tmp/qemu-test/img-1.img,children.1.node-name=node1,children.2.file.filename=/tmp/qemu-test/img-2.img,children.2.node-name=node2
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base
+EEEEEE...EEFFFEEE...EE.EEEE
+======================================================================
+ERROR: test_enospc (__main__.TestEIO)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 765, in test_enospc
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_ignore (__main__.TestEIO)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 689, in test_ignore
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_report (__main__.TestEIO)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 661, in test_report
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stop (__main__.TestEIO)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 722, in test_stop
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_enospc (__main__.TestENOSPC)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 811, in test_enospc
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_overlapping_1 (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 279, in test_overlapping_1
---
+ConnectionResetError: [Errno 104] Connection reset by peer
+
+======================================================================
+ERROR: test_overlapping_5 (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 414, in test_overlapping_5
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 240, in pull_event
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_base_node_name (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 519, in test_stream_base_node_name
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_quorum (__main__.TestQuorum)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 573, in test_stream_quorum
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_set_speed (__main__.TestSetSpeed)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 919, in test_set_speed
---
+ConnectionResetError: [Errno 104] Connection reset by peer
+
+======================================================================
+ERROR: test_set_speed_invalid (__main__.TestSetSpeed)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 955, in test_set_speed_invalid
---
+ConnectionResetError: [Errno 104] Connection reset by peer
+
+======================================================================
+ERROR: test_stream (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 56, in test_stream
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_intermediate (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 75, in test_stream_intermediate
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_partial (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 138, in test_stream_partial
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_pause (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 92, in test_stream_pause
---
+ConnectionResetError: [Errno 104] Connection reset by peer
+
+======================================================================
+ERROR: test_stream (__main__.TestSmallerBackingFile)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 600, in test_stream
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 260, in get_events
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
+
+======================================================================
+ERROR: test_stream_stop (__main__.TestStreamStop)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 854, in setUp
---
+OSError: [Errno 98] Address already in use
+
+======================================================================
+FAIL: test_stream_commit_1 (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 431, in test_stream_commit_1
---
+AssertionError: failed path traversal for "return" in "None"
+
+======================================================================
+FAIL: test_stream_commit_2 (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 463, in test_stream_commit_2
---
+AssertionError: failed path traversal for "return" in "None"
+
+======================================================================
+FAIL: test_stream_parallel (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "030", line 242, in test_stream_parallel
---
 Ran 27 tests
 
-OK
+FAILED (failures=3, errors=17)
  TEST    iotest-qcow2: 031
  TEST    iotest-qcow2: 032
  TEST    iotest-qcow2: 033 [fail]
---
+qemu-img received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../qemu-img compare -f qcow2 -F qcow2 /tmp/qemu-test/quorum2.img /tmp/qemu-test/quorum_repair.img
+................FF.....F...................................................................
+======================================================================
+FAIL: test_cancel_after_ready (__main__.TestRepairQuorum)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "041", line 952, in test_cancel_after_ready
---
+AssertionError: False is not true : target image does not match source after mirroring
+
+======================================================================
+FAIL: test_complete (__main__.TestRepairQuorum)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "041", line 922, in test_complete
---
+AssertionError: False is not true : target image does not match source after mirroring
+
+======================================================================
+FAIL: test_pause (__main__.TestRepairQuorum)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "041", line 977, in test_pause
---
 Ran 91 tests
 
-OK
+FAILED (failures=3)
  TEST    iotest-qcow2: 042
  TEST    iotest-qcow2: 043
  TEST    iotest-qcow2: 046
---
 --- Checking and retrying ---
 virtual size: 64 MiB (67108864 bytes)
+Repairing refcount block 1 is outside image
+ERROR cluster 1048576 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
---
+./common.rc: line 136: 24365 Illegal instruction     (core dumped) ( VALGRIND_QEMU="${VALGRIND_QEMU_IMG}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IMG_PROG" $QEMU_IMG_OPTIONS "$@" )
 --- Repairing ---
 Repairing refcount block 1 is outside image
 ERROR refcount block 2 is not cluster aligned; refcount table entry corrupted
  TEST    iotest-qcow2: 061
  TEST    iotest-qcow2: 062
  TEST    iotest-qcow2: 063 [fail]
---
--- /tmp/qemu-test/src/tests/qemu-iotests/203.out       2020-02-13 07:51:30.000000000 +0000
+++ /tmp/qemu-test/build/tests/qemu-iotests/203.out.bad 2020-02-13 08:37:57.222539469 +0000
@@ -1,3 +1,4 @@
+WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-26219-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-26219-qtest.sock -accel qtest -nodefaults -display none -accel qtest -object iothread,id=iothread0 -drive if=none,id=drive0,file=/tmp/qemu-test/26219-disk0.img,format=qcow2,cache=writeback,aio=threads,node-name=drive0-node -drive if=none,id=drive1,file=/tmp/qemu-test/26219-disk1.img,format=qcow2,cache=writeback,aio=threads,node-name=drive1-node
 Launching VM...
 Setting IOThreads...
 {"return": {}}
---
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 240, in pull_event
+    self.__get_events(wait)
+  File "/tmp/qemu-test/src/tests/qemu-iotests/../../python/qemu/qmp.py", line 135, in __get_events
+    raise QMPConnectError("Error while reading from socket")
+qemu.qmp.QMPConnectError: Error while reading from socket
  TEST    iotest-qcow2: 214
  TEST    iotest-qcow2: 217
  TEST    iotest-qcow2: 220
---
+++ /tmp/qemu-test/build/tests/qemu-iotests/251.out.bad 2020-02-13 08:39:47.528699147 +0000
@@ -5,18 +5,7 @@
 
 qemu-img: warning: error while reading block status at offset status_fail_offset_0: Input/output error
 qemu-img: warning: error while reading block status at offset status_fail_offset_1: Input/output error
-qemu-img: warning: error while reading block status at offset status_fail_offset_0: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_0: Input/output error
-qemu-img: warning: error while reading block status at offset status_fail_offset_1: Input/output error
-qemu-img: warning: error while reading offset status_fail_offset_1: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_2: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_3: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_4: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_5: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_6: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_7: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_8: Input/output error
-qemu-img: warning: error while reading offset read_fail_offset_9: Input/output error
+./common.rc: line 136: 28190 Illegal instruction     (core dumped) ( VALGRIND_QEMU="${VALGRIND_QEMU_IMG}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IMG_PROG" $QEMU_IMG_OPTIONS "$@" )
 
 wrote 512/512 bytes at offset read_fail_offset_0
---
  TEST    iotest-qcow2: 283
Failures: 013 018 019 030 033 041 048 053 060 063 072 086 089 141 181 203 244 251 267
Failed 19 of 116 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=4541c4db1f564563ad1446ded8f742dd', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-9s8159vz/src/docker-src.2020-02-13-03.28.34.18967:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=4541c4db1f564563ad1446ded8f742dd
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-9s8159vz/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    11m30.763s
user    0m9.868s


The full log is available at
http://patchew.org/logs/1581580379-54109-1-git-send-email-robert.hu@linux.intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-13  7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
@ 2020-02-13 10:30   ` Paolo Bonzini
  2020-02-13 11:58     ` Robert Hoo
  2020-02-13 18:20   ` Richard Henderson
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2020-02-13 10:30 UTC (permalink / raw)
  To: Robert Hoo, qemu-devel, laurent, philmd, berrange; +Cc: robert.hu

On 13/02/20 08:52, Robert Hoo wrote:
> +
> +}
> +#pragma GCC pop_options
> +#endif
> +
> +
>  /* Note that for test_buffer_is_zero_next_accel, the most preferred
>   * ISA must have the least significant bit.
>   */
> -#define CACHE_AVX2    1
> -#define CACHE_SSE4    2
> -#define CACHE_SSE2    4
> +#define CACHE_AVX512F 1
> +#define CACHE_AVX2    2
> +#define CACHE_SSE4    4
> +#define CACHE_SSE2    6

This should be 8, not 6.

Paolo

>  
>  /* Make sure that these variables are appropriately initialized when
>   * SSE2 is enabled on the compiler command-line, but the compiler is
> @@ -226,6 +268,11 @@ static void init_accel(unsigned cache)
>          fn = buffer_zero_avx2;
>      }
>  #endif
> +#ifdef CONFIG_AVX512F_OPT
> +    if (cache & CACHE_AVX512F) {
> +        fn = buffer_zero_avx512;
> +    }
> +#endif
>      buffer_accel = fn;
>  }
>  
> @@ -255,6 +302,9 @@ static void __attribute__((constructor)) init_cpuid_cache(void)
>              if ((bv & 6) == 6 && (b & bit_AVX2)) {
>                  cache |= CACHE_AVX2;
>              }
> +            if ((bv & 6) == 6 && (b & bit_AVX512F)) {
> +                cache |= CACHE_AVX512F;
> +            }
>          }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-13 10:30   ` Paolo Bonzini
@ 2020-02-13 11:58     ` Robert Hoo
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Hoo @ 2020-02-13 11:58 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, laurent, philmd, berrange; +Cc: robert.hu

On Thu, 2020-02-13 at 11:30 +0100, Paolo Bonzini wrote:
> On 13/02/20 08:52, Robert Hoo wrote:
> > +
> > +}
> > +#pragma GCC pop_options
> > +#endif
> > +
> > +
> >  /* Note that for test_buffer_is_zero_next_accel, the most
> > preferred
> >   * ISA must have the least significant bit.
> >   */
> > -#define CACHE_AVX2    1
> > -#define CACHE_SSE4    2
> > -#define CACHE_SSE2    4
> > +#define CACHE_AVX512F 1
> > +#define CACHE_AVX2    2
> > +#define CACHE_SSE4    4
> > +#define CACHE_SSE2    6
> 
> This should be 8, not 6.
> 
> Paolo

Thanks Paolo, going to fix it in v2.
> 
> >  
> >  /* Make sure that these variables are appropriately initialized
> > when
> >   * SSE2 is enabled on the compiler command-line, but the compiler
> > is
> > @@ -226,6 +268,11 @@ static void init_accel(unsigned cache)
> >          fn = buffer_zero_avx2;
> >      }
> >  #endif
> > +#ifdef CONFIG_AVX512F_OPT
> > +    if (cache & CACHE_AVX512F) {
> > +        fn = buffer_zero_avx512;
> > +    }
> > +#endif
> >      buffer_accel = fn;
> >  }
> >  
> > @@ -255,6 +302,9 @@ static void __attribute__((constructor))
> > init_cpuid_cache(void)
> >              if ((bv & 6) == 6 && (b & bit_AVX2)) {
> >                  cache |= CACHE_AVX2;
> >              }
> > +            if ((bv & 6) == 6 && (b & bit_AVX512F)) {
> > +                cache |= CACHE_AVX512F;
> > +            }
> >          }
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-13  7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
  2020-02-13 10:30   ` Paolo Bonzini
@ 2020-02-13 18:20   ` Richard Henderson
  2020-02-24  7:07     ` Robert Hoo
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2020-02-13 18:20 UTC (permalink / raw)
  To: Robert Hoo, qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu

On 2/12/20 11:52 PM, Robert Hoo wrote:
> And initialize buffer_is_zero() with it, when Intel AVX512F is
> available on host.
> 
> This function utilizes Intel AVX512 fundamental instructions which
> perform over previous AVX2 instructions.

Is it not still true that any AVX512 insn will cause the entire cpu package,
not just the current core, to drop frequency by 20%?

As far as I know one should only use the 512-bit instructions when you can
overcome that frequency drop, which seems unlikely in this case.  That said...


> +    if (unlikely(len < 64)) { /*buff less than 512 bits, unlikely*/
> +        return buffer_zero_int(buf, len);
> +    }

First, len < 64 has been eliminated already in select_accel_fn.
Second, len < 256 is not handled properly by the code below...


> +    /* Begin with an unaligned head of 64 bytes.  */
> +    t = _mm512_loadu_si512(buf);
> +    p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
> +    e = (__m512i *)(((uintptr_t)buf + len) & -64);
> +
> +    /* Loop over 64-byte aligned blocks of 256.  */
> +    while (p < e) {
> +        __builtin_prefetch(p);
> +        if (unlikely(_mm512_test_epi64_mask(t, t))) {
> +            return false;
> +        }
> +        t = p[-4] | p[-3] | p[-2] | p[-1];
> +        p += 4;
> +    }
> +
> +    t |= _mm512_loadu_si512(buf + len - 4 * 64);
> +    t |= _mm512_loadu_si512(buf + len - 3 * 64);
> +    t |= _mm512_loadu_si512(buf + len - 2 * 64);
> +    t |= _mm512_loadu_si512(buf + len - 1 * 64);

... because this final sequence loads 256 bytes.

Rather than make a second test vs 256 in buffer_zero_avx512, I wonder if it
would be better to have select_accel_fn do the job.  Have a global variable
buffer_accel_size alongside buffer_accel so there's only one branch
(mis)predict to worry about.

FWIW, something that the compiler should do, but doesn't currently, is use
vpternlogq to perform a 3-input OR.  Something like

    /* 0xfe -> orABC */
    t = _mm512_ternarylogic_epi64(t, p[-4], p[-3], 0xfe);
    t = _mm512_ternarylogic_epi64(t, p[-2], p[-1], 0xfe);


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-13 18:20   ` Richard Henderson
@ 2020-02-24  7:07     ` Robert Hoo
  2020-02-24 16:13       ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Hoo @ 2020-02-24  7:07 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, pbonzini, laurent, philmd, berrange
  Cc: robert.hu

Thanks Richard:-)
Sorry for late reply.
On Thu, 2020-02-13 at 10:20 -0800, Richard Henderson wrote:
> On 2/12/20 11:52 PM, Robert Hoo wrote:
> > And initialize buffer_is_zero() with it, when Intel AVX512F is
> > available on host.
> > 
> > This function utilizes Intel AVX512 fundamental instructions which
> > perform over previous AVX2 instructions.
> 
> Is it not still true that any AVX512 insn will cause the entire cpu
> package,
> not just the current core, to drop frequency by 20%?
> 
> As far as I know one should only use the 512-bit instructions when
> you can
> overcome that frequency drop, which seems unlikely in this
> case.  That said...
> I don't think so. AVX512 has been applied in various places.
> > +    if (unlikely(len < 64)) { /*buff less than 512 bits,
> > unlikely*/
> > +        return buffer_zero_int(buf, len);
> > +    }
> 
> First, len < 64 has been eliminated already in select_accel_fn.
> Second, len < 256 is not handled properly by the code below...
> 
Right. I'm going to fix this in v2.
> 
> > +    /* Begin with an unaligned head of 64 bytes.  */
> > +    t = _mm512_loadu_si512(buf);
> > +    p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
> > +    e = (__m512i *)(((uintptr_t)buf + len) & -64);
> > +
> > +    /* Loop over 64-byte aligned blocks of 256.  */
> > +    while (p < e) {
> > +        __builtin_prefetch(p);
> > +        if (unlikely(_mm512_test_epi64_mask(t, t))) {
> > +            return false;
> > +        }
> > +        t = p[-4] | p[-3] | p[-2] | p[-1];
> > +        p += 4;
> > +    }
> > +
> > +    t |= _mm512_loadu_si512(buf + len - 4 * 64);
> > +    t |= _mm512_loadu_si512(buf + len - 3 * 64);
> > +    t |= _mm512_loadu_si512(buf + len - 2 * 64);
> > +    t |= _mm512_loadu_si512(buf + len - 1 * 64);
> 
> ... because this final sequence loads 256 bytes.
> 
> Rather than make a second test vs 256 in buffer_zero_avx512, I wonder
> if it
> would be better to have select_accel_fn do the job.  Have a global
> variable
> buffer_accel_size alongside buffer_accel so there's only one branch
> (mis)predict to worry about.
> 
Thanks Richard, very enlightening!
Inspired by your suggestion, I'm thinking go further: use immediate
rather than a global variable, so that saves 1 memory(/cache) access. 

#ifdef CONFIG_AVX512F_OPT   
#define OPTIMIZE_LEN    256
#else
#define OPTIMIZE_LEN    64
#endif
> FWIW, something that the compiler should do, but doesn't currently,
> is use
> vpternlogq to perform a 3-input OR.  Something like
> 
>     /* 0xfe -> orABC */
>     t = _mm512_ternarylogic_epi64(t, p[-4], p[-3], 0xfe);
>     t = _mm512_ternarylogic_epi64(t, p[-2], p[-1], 0xfe);
> 
Very enlightening. Yes, seems compiler doesn't do this.
I tried explicitly use this, however, looks it will have more
instructions generated, and unit test shows it performs less than then
conventional code.
Let me keep the conventional code for this moment, will ask around and
dig further outside this patch.

> 
> r~



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-24  7:07     ` Robert Hoo
@ 2020-02-24 16:13       ` Richard Henderson
  2020-02-25  7:34         ` Robert Hoo
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2020-02-24 16:13 UTC (permalink / raw)
  To: Robert Hoo, qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu

On 2/23/20 11:07 PM, Robert Hoo wrote:
> Inspired by your suggestion, I'm thinking go further: use immediate
> rather than a global variable, so that saves 1 memory(/cache) access. 
> 
> #ifdef CONFIG_AVX512F_OPT   
> #define OPTIMIZE_LEN    256
> #else
> #define OPTIMIZE_LEN    64
> #endif

With that, the testing in tests/test-bufferiszero.c, looping through the
implementations, is invalidated.  Because once you start compiling for avx512,
you're no longer testing sse2 et al with the same inputs.

IF we want to change the length to suit avx512, we would want to change it
unconditionally.  And then you could also tidy up avx2 to avoid the extra
comparisons there.


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-24 16:13       ` Richard Henderson
@ 2020-02-25  7:34         ` Robert Hoo
  2020-02-25 15:29           ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Robert Hoo @ 2020-02-25  7:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, pbonzini, laurent, philmd, berrange
  Cc: robert.hu

On Mon, 2020-02-24 at 08:13 -0800, Richard Henderson wrote:
> On 2/23/20 11:07 PM, Robert Hoo wrote:
> > Inspired by your suggestion, I'm thinking go further: use immediate
> > rather than a global variable, so that saves 1 memory(/cache)
> > access. 
> > 
> > #ifdef CONFIG_AVX512F_OPT   
> > #define OPTIMIZE_LEN    256
> > #else
> > #define OPTIMIZE_LEN    64
> > #endif
> 
> With that, the testing in tests/test-bufferiszero.c, looping through
> the
> implementations, is invalidated.  Because once you start compiling
> for avx512,
> you're no longer testing sse2 et al with the same inputs.
> 
Right. Thanks pointing out. I didn't noticed that.
More precisely, it would cause no longer testing sse2 et al with < 256
length.

> IF we want to change the length to suit avx512, we would want to
> change it
> unconditionally.  And then you could also tidy up avx2 to avoid the
> extra
> comparisons there.
Considering the length's dependency on sse2/sse4/avx2/avx512 and the
algorithms, as well as future's possible changes, additions, I'd rather
roll back to your original suggestion, use a companion variable with
each accel_fn(). How do you like it?

> 
> 
> r~



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
  2020-02-25  7:34         ` Robert Hoo
@ 2020-02-25 15:29           ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2020-02-25 15:29 UTC (permalink / raw)
  To: Robert Hoo, qemu-devel, pbonzini, laurent, philmd, berrange; +Cc: robert.hu

On 2/24/20 11:34 PM, Robert Hoo wrote:
> Considering the length's dependency on sse2/sse4/avx2/avx512 and the
> algorithms, as well as future's possible changes, additions, I'd rather
> roll back to your original suggestion, use a companion variable with
> each accel_fn(). How do you like it?

How do I like it?

With a modification to init_accel() so that the function and the minimum length
are selected at the same time.


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-25 15:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13  7:52 [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() Robert Hoo
2020-02-13  7:52 ` [PATCH 1/2] configure: add configure option avx512f_opt Robert Hoo
2020-02-13  7:52 ` [PATCH 2/2] util: add util function buffer_zero_avx512() Robert Hoo
2020-02-13 10:30   ` Paolo Bonzini
2020-02-13 11:58     ` Robert Hoo
2020-02-13 18:20   ` Richard Henderson
2020-02-24  7:07     ` Robert Hoo
2020-02-24 16:13       ` Richard Henderson
2020-02-25  7:34         ` Robert Hoo
2020-02-25 15:29           ` Richard Henderson
2020-02-13  8:40 ` [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512() no-reply

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).