qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization
@ 2016-01-20  9:05 Liang Li
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute Liang Li
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Liang Li @ 2016-01-20  9:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Liang Li, mst, rth7680, dgilbert, quintela,
	stefanha, amit.shah, pbonzini, rth

buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 instructions for optimization. For platform supports
AVX2 instructions, use the AVX2 instructions for optimization can help
to improve the performance about 30% comparing to SSE2.
Zero page check can be faster with this optimization, the test result
shows that for an 8GB RAM idle guest, this patch can help to shorten
the total live migration time about 6%.

This patch use the ifunc mechanism to select the proper function when
running, for platform supports AVX2, execute the AVX2 instructions,
else, execute the original instructions.

With this patch, the QEMU binary can run on both platforms support AVX2
or not.

Compiler which doesn't support the AVX2 and ifunc attribute can also build
the source code successfully.

v3 -> v4 changes:
  * Use the GCC #pragma to make things simple (Paolo's suggestion) 
  * Put avx2 related code in cutils.c (Richard's suggestion)
  * Change the configure, detect ifunc and avx2 attributes together

v2 -> v3 changes:
  * Detect the ifunc attribute support (Paolo's suggestion) 
  * Use the ifunc attribute instead of the inline asm (Richard's suggestion)
  * Change the configure (Juan's suggestion)

Liang Li (2):
  configure: detect ifunc and avx2 attribute
  cutils: add avx2 instruction optimization

 configure             |  20 +++++++++
 include/qemu-common.h |   8 +---
 util/cutils.c         | 118 ++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 135 insertions(+), 11 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute
  2016-01-20  9:05 [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization Liang Li
@ 2016-01-20  9:05 ` Liang Li
  2016-01-20  9:50   ` Paolo Bonzini
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization Liang Li
  2016-01-20 10:22 ` [Qemu-devel] [PATCH v4 0/3] " 陈博
  2 siblings, 1 reply; 8+ messages in thread
From: Liang Li @ 2016-01-20  9:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Liang Li, mst, rth7680, dgilbert, quintela,
	stefanha, amit.shah, pbonzini, rth

Detect if the compiler can support the ifun and avx2, if so, set
CONFIG_AVX2_OPT which will be used to turn on the avx2 instruction
optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 configure | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/configure b/configure
index 44ac9ab..b7f4661 100755
--- a/configure
+++ b/configure
@@ -310,6 +310,7 @@ smartcard=""
 libusb=""
 usb_redir=""
 opengl=""
+avx2_opt=""
 zlib="yes"
 lzo=""
 snappy=""
@@ -1827,6 +1828,20 @@ EOF
 fi
 
 ##########################################
+# avx2 optimization requirement check
+
+cat > $TMPC << EOF
+static void bar(void) {}
+static void foo(void) __attribute__((ifunc("bar")));
+int main(void) { foo(); return 0; }
+EOF
+if compile_prog "" "-mavx2" ; then
+    avx2_opt="yes"
+else
+    avx2_opt="no"
+fi
+
+#########################################
 # zlib check
 
 if test "$zlib" != "no" ; then
@@ -4855,6 +4870,7 @@ echo "bzip2 support     $bzip2"
 echo "NUMA host support $numa"
 echo "tcmalloc support  $tcmalloc"
 echo "jemalloc support  $jemalloc"
+echo "avx2 optimization $avx2_opt"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5236,6 +5252,10 @@ if test "$opengl" = "yes" ; then
   echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
 fi
 
+if test "$avx2_opt" = "yes" ; then
+  echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
+fi
+
 if test "$lzo" = "yes" ; then
   echo "CONFIG_LZO=y" >> $config_host_mak
 fi
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization
  2016-01-20  9:05 [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization Liang Li
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute Liang Li
@ 2016-01-20  9:05 ` Liang Li
  2016-01-20  9:46   ` Paolo Bonzini
  2016-01-20 10:22 ` [Qemu-devel] [PATCH v4 0/3] " 陈博
  2 siblings, 1 reply; 8+ messages in thread
From: Liang Li @ 2016-01-20  9:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Liang Li, mst, rth7680, dgilbert, quintela,
	stefanha, amit.shah, pbonzini, rth

buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 instructions for optimization. For platform supports
AVX2 instructions, use AVX2 instructions for optimization can help
to improve the performance about 30% comparing to SSE2.

Zero page check can be faster with this optimization, the test result
shows that for an 8GiB RAM idle guest just boots, this patch can help
to shorten the total live migration time about 6%.

This patch use the ifunc mechanism to select the proper function when
running, for platform supports AVX2, execute the AVX2 instructions,
else, execute the original instructions.

Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 include/qemu-common.h |   8 +---
 util/cutils.c         | 118 ++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 115 insertions(+), 11 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 22b010c..f4c8c24 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -483,13 +483,7 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 #endif
 
 #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
-static inline bool
-can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
-{
-    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
-                   * sizeof(VECTYPE)) == 0
-            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
-}
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
 size_t buffer_find_nonzero_offset(const void *buf, size_t len);
 
 /*
diff --git a/util/cutils.c b/util/cutils.c
index cfeb848..5c8ee5c 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -161,6 +161,14 @@ int qemu_fdatasync(int fd)
 #endif
 }
 
+static bool
+can_use_buffer_find_nonzero_offset_inner(const void *buf, size_t len)
+{
+    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+                   * sizeof(VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
+}
+
 /*
  * Searches for an area with non-zero content in a buffer
  *
@@ -169,8 +177,8 @@ int qemu_fdatasync(int fd)
  * and addr must be a multiple of sizeof(VECTYPE) due to
  * restriction of optimizations in this function.
  *
- * can_use_buffer_find_nonzero_offset() can be used to check
- * these requirements.
+ * can_use_buffer_find_nonzero_offset_inner() can be used to
+ * check these requirements.
  *
  * The return value is the offset of the non-zero area rounded
  * down to a multiple of sizeof(VECTYPE) for the first
@@ -181,13 +189,13 @@ int qemu_fdatasync(int fd)
  * If the buffer is all zero the return value is equal to len.
  */
 
-size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
 {
     const VECTYPE *p = buf;
     const VECTYPE zero = (VECTYPE){0};
     size_t i;
 
-    assert(can_use_buffer_find_nonzero_offset(buf, len));
+    assert(can_use_buffer_find_nonzero_offset_inner(buf, len));
 
     if (!len) {
         return 0;
@@ -216,6 +224,108 @@ size_t buffer_find_nonzero_offset(const void *buf, size_t len)
     return i * sizeof(VECTYPE);
 }
 
+#ifdef CONFIG_AVX2_OPT
+#pragma GCC push_options
+#pragma GCC target("avx2")
+#include <cpuid.h>
+#include <immintrin.h>
+
+#define AVX2_VECTYPE        __m256i
+#define AVX2_SPLAT(p)       _mm256_set1_epi8(*(p))
+#define AVX2_ALL_EQ(v1, v2) \
+    (_mm256_movemask_epi8(_mm256_cmpeq_epi8(v1, v2)) == 0xFFFFFFFF)
+#define AVX2_VEC_OR(v1, v2) (_mm256_or_si256(v1, v2))
+
+static bool
+can_use_buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
+{
+    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
+                   * sizeof(AVX2_VECTYPE)) == 0
+            && ((uintptr_t) buf) % sizeof(AVX2_VECTYPE) == 0);
+}
+
+static size_t buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
+{
+    const AVX2_VECTYPE *p = buf;
+    const AVX2_VECTYPE zero = (AVX2_VECTYPE){0};
+    size_t i;
+
+    assert(can_use_buffer_find_nonzero_offset_avx2(buf, len));
+
+    if (!len) {
+        return 0;
+    }
+
+    for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
+        if (!AVX2_ALL_EQ(p[i], zero)) {
+            return i * sizeof(AVX2_VECTYPE);
+        }
+    }
+
+    for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
+         i < len / sizeof(AVX2_VECTYPE);
+         i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
+        AVX2_VECTYPE tmp0 = AVX2_VEC_OR(p[i + 0], p[i + 1]);
+        AVX2_VECTYPE tmp1 = AVX2_VEC_OR(p[i + 2], p[i + 3]);
+        AVX2_VECTYPE tmp2 = AVX2_VEC_OR(p[i + 4], p[i + 5]);
+        AVX2_VECTYPE tmp3 = AVX2_VEC_OR(p[i + 6], p[i + 7]);
+        AVX2_VECTYPE tmp01 = AVX2_VEC_OR(tmp0, tmp1);
+        AVX2_VECTYPE tmp23 = AVX2_VEC_OR(tmp2, tmp3);
+        if (!AVX2_ALL_EQ(AVX2_VEC_OR(tmp01, tmp23), zero)) {
+            break;
+        }
+    }
+
+    return i * sizeof(AVX2_VECTYPE);
+}
+
+static bool avx2_support(void)
+{
+    int a, b, c, d;
+
+    if (__get_cpuid_max(0, NULL) < 7) {
+        return false;
+    }
+
+    __cpuid_count(7, 0, a, b, c, d);
+
+    return b & bit_AVX2;
+}
+
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) \
+         __attribute__ ((ifunc("can_use_buffer_find_nonzero_offset_ifunc")));
+size_t buffer_find_nonzero_offset(const void *buf, size_t len) \
+         __attribute__ ((ifunc("buffer_find_nonzero_offset_ifunc")));
+
+static void *buffer_find_nonzero_offset_ifunc(void)
+{
+    typeof(buffer_find_nonzero_offset) *func = (avx2_support()) ?
+        buffer_find_nonzero_offset_avx2 : buffer_find_nonzero_offset_inner;
+
+    return func;
+}
+
+static void *can_use_buffer_find_nonzero_offset_ifunc(void)
+{
+    typeof(can_use_buffer_find_nonzero_offset) *func = (avx2_support()) ?
+        can_use_buffer_find_nonzero_offset_avx2 :
+        can_use_buffer_find_nonzero_offset_inner;
+
+    return func;
+}
+#pragma GCC pop_options
+#else
+bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return can_use_buffer_find_nonzero_offset_inner(buf, len);
+}
+
+size_t buffer_find_nonzero_offset(const void *buf, size_t len)
+{
+    return buffer_find_nonzero_offset_inner(buf, len);
+}
+#endif
+
 /*
  * Checks if a buffer is all zeroes
  *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization Liang Li
@ 2016-01-20  9:46   ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2016-01-20  9:46 UTC (permalink / raw)
  To: Liang Li, qemu-devel
  Cc: peter.maydell, mst, rth7680, dgilbert, quintela, stefanha,
	amit.shah, rth



On 20/01/2016 10:05, Liang Li wrote:
> buffer_find_nonzero_offset() is a hot function during live migration.
> Now it use SSE2 instructions for optimization. For platform supports
> AVX2 instructions, use AVX2 instructions for optimization can help
> to improve the performance about 30% comparing to SSE2.
> 
> Zero page check can be faster with this optimization, the test result
> shows that for an 8GiB RAM idle guest just boots, this patch can help
> to shorten the total live migration time about 6%.
> 
> This patch use the ifunc mechanism to select the proper function when
> running, for platform supports AVX2, execute the AVX2 instructions,
> else, execute the original instructions.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>  include/qemu-common.h |   8 +---
>  util/cutils.c         | 118 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 115 insertions(+), 11 deletions(-)
> 
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 22b010c..f4c8c24 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -483,13 +483,7 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
>  #endif
>  
>  #define BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR 8
> -static inline bool
> -can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
> -{
> -    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> -                   * sizeof(VECTYPE)) == 0
> -            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
> -}
> +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len);
>  size_t buffer_find_nonzero_offset(const void *buf, size_t len);
>  
>  /*
> diff --git a/util/cutils.c b/util/cutils.c
> index cfeb848..5c8ee5c 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -161,6 +161,14 @@ int qemu_fdatasync(int fd)
>  #endif
>  }
>  
> +static bool
> +can_use_buffer_find_nonzero_offset_inner(const void *buf, size_t len)
> +{
> +    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +                   * sizeof(VECTYPE)) == 0
> +            && ((uintptr_t) buf) % sizeof(VECTYPE) == 0);
> +}
> +
>  /*
>   * Searches for an area with non-zero content in a buffer
>   *
> @@ -169,8 +177,8 @@ int qemu_fdatasync(int fd)
>   * and addr must be a multiple of sizeof(VECTYPE) due to
>   * restriction of optimizations in this function.
>   *
> - * can_use_buffer_find_nonzero_offset() can be used to check
> - * these requirements.
> + * can_use_buffer_find_nonzero_offset_inner() can be used to
> + * check these requirements.
>   *
>   * The return value is the offset of the non-zero area rounded
>   * down to a multiple of sizeof(VECTYPE) for the first
> @@ -181,13 +189,13 @@ int qemu_fdatasync(int fd)
>   * If the buffer is all zero the return value is equal to len.
>   */
>  
> -size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> +static size_t buffer_find_nonzero_offset_inner(const void *buf, size_t len)
>  {
>      const VECTYPE *p = buf;
>      const VECTYPE zero = (VECTYPE){0};
>      size_t i;
>  
> -    assert(can_use_buffer_find_nonzero_offset(buf, len));
> +    assert(can_use_buffer_find_nonzero_offset_inner(buf, len));
>  
>      if (!len) {
>          return 0;
> @@ -216,6 +224,108 @@ size_t buffer_find_nonzero_offset(const void *buf, size_t len)
>      return i * sizeof(VECTYPE);
>  }
>  
> +#ifdef CONFIG_AVX2_OPT
> +#pragma GCC push_options
> +#pragma GCC target("avx2")
> +#include <cpuid.h>
> +#include <immintrin.h>
> +
> +#define AVX2_VECTYPE        __m256i
> +#define AVX2_SPLAT(p)       _mm256_set1_epi8(*(p))
> +#define AVX2_ALL_EQ(v1, v2) \
> +    (_mm256_movemask_epi8(_mm256_cmpeq_epi8(v1, v2)) == 0xFFFFFFFF)
> +#define AVX2_VEC_OR(v1, v2) (_mm256_or_si256(v1, v2))
> +
> +static bool
> +can_use_buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
> +{
> +    return (len % (BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> +                   * sizeof(AVX2_VECTYPE)) == 0
> +            && ((uintptr_t) buf) % sizeof(AVX2_VECTYPE) == 0);
> +}
> +
> +static size_t buffer_find_nonzero_offset_avx2(const void *buf, size_t len)
> +{
> +    const AVX2_VECTYPE *p = buf;
> +    const AVX2_VECTYPE zero = (AVX2_VECTYPE){0};
> +    size_t i;
> +
> +    assert(can_use_buffer_find_nonzero_offset_avx2(buf, len));
> +
> +    if (!len) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
> +        if (!AVX2_ALL_EQ(p[i], zero)) {
> +            return i * sizeof(AVX2_VECTYPE);
> +        }
> +    }
> +
> +    for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
> +         i < len / sizeof(AVX2_VECTYPE);
> +         i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
> +        AVX2_VECTYPE tmp0 = AVX2_VEC_OR(p[i + 0], p[i + 1]);
> +        AVX2_VECTYPE tmp1 = AVX2_VEC_OR(p[i + 2], p[i + 3]);
> +        AVX2_VECTYPE tmp2 = AVX2_VEC_OR(p[i + 4], p[i + 5]);
> +        AVX2_VECTYPE tmp3 = AVX2_VEC_OR(p[i + 6], p[i + 7]);
> +        AVX2_VECTYPE tmp01 = AVX2_VEC_OR(tmp0, tmp1);
> +        AVX2_VECTYPE tmp23 = AVX2_VEC_OR(tmp2, tmp3);
> +        if (!AVX2_ALL_EQ(AVX2_VEC_OR(tmp01, tmp23), zero)) {
> +            break;
> +        }
> +    }
> +
> +    return i * sizeof(AVX2_VECTYPE);
> +}
> +
> +static bool avx2_support(void)
> +{
> +    int a, b, c, d;
> +
> +    if (__get_cpuid_max(0, NULL) < 7) {
> +        return false;
> +    }
> +
> +    __cpuid_count(7, 0, a, b, c, d);
> +
> +    return b & bit_AVX2;
> +}
> +
> +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len) \
> +         __attribute__ ((ifunc("can_use_buffer_find_nonzero_offset_ifunc")));
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len) \
> +         __attribute__ ((ifunc("buffer_find_nonzero_offset_ifunc")));
> +
> +static void *buffer_find_nonzero_offset_ifunc(void)
> +{
> +    typeof(buffer_find_nonzero_offset) *func = (avx2_support()) ?
> +        buffer_find_nonzero_offset_avx2 : buffer_find_nonzero_offset_inner;
> +
> +    return func;
> +}
> +
> +static void *can_use_buffer_find_nonzero_offset_ifunc(void)
> +{
> +    typeof(can_use_buffer_find_nonzero_offset) *func = (avx2_support()) ?
> +        can_use_buffer_find_nonzero_offset_avx2 :
> +        can_use_buffer_find_nonzero_offset_inner;
> +
> +    return func;
> +}
> +#pragma GCC pop_options
> +#else
> +bool can_use_buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    return can_use_buffer_find_nonzero_offset_inner(buf, len);
> +}
> +
> +size_t buffer_find_nonzero_offset(const void *buf, size_t len)
> +{
> +    return buffer_find_nonzero_offset_inner(buf, len);
> +}
> +#endif
> +
>  /*
>   * Checks if a buffer is all zeroes
>   *
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute Liang Li
@ 2016-01-20  9:50   ` Paolo Bonzini
  2016-01-20 10:43     ` Li, Liang Z
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2016-01-20  9:50 UTC (permalink / raw)
  To: Liang Li, qemu-devel
  Cc: peter.maydell, mst, rth7680, dgilbert, quintela, stefanha,
	amit.shah, rth



On 20/01/2016 10:05, Liang Li wrote:
> Detect if the compiler can support the ifun and avx2, if so, set
> CONFIG_AVX2_OPT which will be used to turn on the avx2 instruction
> optimization.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  configure | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/configure b/configure
> index 44ac9ab..b7f4661 100755
> --- a/configure
> +++ b/configure
> @@ -310,6 +310,7 @@ smartcard=""
>  libusb=""
>  usb_redir=""
>  opengl=""
> +avx2_opt=""
>  zlib="yes"
>  lzo=""
>  snappy=""
> @@ -1827,6 +1828,20 @@ EOF
>  fi
>  
>  ##########################################
> +# avx2 optimization requirement check
> +
> +cat > $TMPC << EOF
> +static void bar(void) {}

Might be nicer to use "void *" and return an actual function name:

static void bar(void) {}
static void *bar_ifunc(void) { return (void *)bar; }
void foo(void) __attribute__((ifunc("bar_ifunc")));

And also you probably should use "readelf --syms ... | grep IFUNC.*foo"
to check that the attribute was not ignored.

Paolo


> +static void foo(void) __attribute__((ifunc("bar")));
> +int main(void) { foo(); return 0; }
> +EOF
> +if compile_prog "" "-mavx2" ; then
> +    avx2_opt="yes"
> +else
> +    avx2_opt="no"
> +fi
> +
> +#########################################
>  # zlib check
>  
>  if test "$zlib" != "no" ; then
> @@ -4855,6 +4870,7 @@ echo "bzip2 support     $bzip2"
>  echo "NUMA host support $numa"
>  echo "tcmalloc support  $tcmalloc"
>  echo "jemalloc support  $jemalloc"
> +echo "avx2 optimization $avx2_opt"
>  
>  if test "$sdl_too_old" = "yes"; then
>  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> @@ -5236,6 +5252,10 @@ if test "$opengl" = "yes" ; then
>    echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
>  fi
>  
> +if test "$avx2_opt" = "yes" ; then
> +  echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
> +fi
> +
>  if test "$lzo" = "yes" ; then
>    echo "CONFIG_LZO=y" >> $config_host_mak
>  fi
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization
  2016-01-20  9:05 [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization Liang Li
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute Liang Li
  2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization Liang Li
@ 2016-01-20 10:22 ` 陈博
  2016-01-20 15:25   ` Eric Blake
  2 siblings, 1 reply; 8+ messages in thread
From: 陈博 @ 2016-01-20 10:22 UTC (permalink / raw)
  To: Liang Li
  Cc: peter.maydell, quintela, rth7680, qemu-devel, dgilbert, mst,
	stefanha, amit.shah, pbonzini, rth

[-- Attachment #1: Type: text/plain, Size: 930 bytes --]

Sorry for disturbing by reply, don't know why I'm not able to send a new mail.
————

Hi folks,

Could you enlighten me how to achieve proportional IO sharing by using cgroup, instead of qemu's io-throttling?

My qemu config is like: -drive file=$DISKFILe,if=none,format=qcow2,cache=none,aio=native -device virtio-blk-pci...

Test command inside vm is like: dd if=/dev/vdc of=/dev/null iflag=direct

Cgroup blkio weight of the qemu process is properly configured as well.

But no matter how change the proportion, such as vm1=400 and vm2=100, I can only get the equal IO speed.

Wondering cgroup blkio.weight or blkio.weight_device has no effect on qemu?


PS. cache=writethrough aio=threads is also tested, the same results. 


- Bob



在 2016年1月20日,下午5:05,Liang Li <liang.z.li@intel.com> 写道:

> buffer_find_nonzero_offset() is a hot function during live migration.
> 


[-- Attachment #2: Type: text/html, Size: 5068 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute
  2016-01-20  9:50   ` Paolo Bonzini
@ 2016-01-20 10:43     ` Li, Liang Z
  0 siblings, 0 replies; 8+ messages in thread
From: Li, Liang Z @ 2016-01-20 10:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: peter.maydell, mst, rth7680, dgilbert, quintela, stefanha,
	amit.shah, rth

> On 20/01/2016 10:05, Liang Li wrote:
> > Detect if the compiler can support the ifun and avx2, if so, set
> > CONFIG_AVX2_OPT which will be used to turn on the avx2 instruction
> > optimization.
> >
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  configure | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/configure b/configure
> > index 44ac9ab..b7f4661 100755
> > --- a/configure
> > +++ b/configure
> > @@ -310,6 +310,7 @@ smartcard=""
> >  libusb=""
> >  usb_redir=""
> >  opengl=""
> > +avx2_opt=""
> >  zlib="yes"
> >  lzo=""
> >  snappy=""
> > @@ -1827,6 +1828,20 @@ EOF
> >  fi
> >
> >  ##########################################
> > +# avx2 optimization requirement check
> > +
> > +cat > $TMPC << EOF
> > +static void bar(void) {}
> 
> Might be nicer to use "void *" and return an actual function name:
> 
> static void bar(void) {}
> static void *bar_ifunc(void) { return (void *)bar; } void foo(void)
> __attribute__((ifunc("bar_ifunc")));
> 
> And also you probably should use "readelf --syms ... | grep IFUNC.*foo"
> to check that the attribute was not ignored.
> 
> Paolo
> 

Got it, will change in the next version.

Liang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization
  2016-01-20 10:22 ` [Qemu-devel] [PATCH v4 0/3] " 陈博
@ 2016-01-20 15:25   ` Eric Blake
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Blake @ 2016-01-20 15:25 UTC (permalink / raw)
  To: 陈博, Liang Li
  Cc: peter.maydell, mst, rth7680, qemu-devel, dgilbert, quintela,
	stefanha, amit.shah, pbonzini, rth

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

On 01/20/2016 03:22 AM, 陈博 wrote:
> Sorry for disturbing by reply, don't know why I'm not able to send a new mail.
> ————
> 
> Hi folks,
> 
> Could you enlighten me how to achieve proportional IO sharing by using cgroup, instead of qemu's io-throttling?

Please don't commandeer an unrelated thread for your question.  Also,
your question has come through multiple times; in addition to this
incorrectly threaded reply, you also have the same question at least 9
more times right here:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/threads.html#03374

Remember, the list moderates first-time posts (whether or not the poster
is subscribed) as well as allowing non-subscriber posts.  Although you
may not immediately see your post land on the list archives, you should
wait around 24 hours before assuming your message is lost and attempting
to repost, to give the moderators time to let your messages through.
Otherwise, you end up storming the list.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-20 15:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20  9:05 [Qemu-devel] [PATCH v4 0/3] add avx2 instruction optimization Liang Li
2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 1/2] configure: detect ifunc and avx2 attribute Liang Li
2016-01-20  9:50   ` Paolo Bonzini
2016-01-20 10:43     ` Li, Liang Z
2016-01-20  9:05 ` [Qemu-devel] [PATCH v4 2/2] cutils: add avx2 instruction optimization Liang Li
2016-01-20  9:46   ` Paolo Bonzini
2016-01-20 10:22 ` [Qemu-devel] [PATCH v4 0/3] " 陈博
2016-01-20 15:25   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).