All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU
@ 2018-03-09 13:29 Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 1/4] rcutorture: remove synchronize_rcu from readers Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-09 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G . Cota

I didn't have time to check whether it improves performance (not
too likely, since QEMU tends to have pretty long RCU critical
sections), but it cannot hurt either. :)

For microbenchmark results, see patch 4.

Paolo


Paolo Bonzini (4):
  rcutorture: remove synchronize_rcu from readers
  rcu: make memory barriers more explicit
  membarrier: introduce qemu/sys_membarrier.h
  membarrier: add --enable-membarrier

 configure                     | 41 ++++++++++++++++++++++++++++++++++-
 include/qemu/rcu.h            | 16 ++++++++++++--
 include/qemu/sys_membarrier.h | 27 +++++++++++++++++++++++
 tests/rcutorture.c            |  4 ---
 util/Makefile.objs            |  1 +
 util/rcu.c                    | 15 +++++++++----
 util/sys_membarrier.c         | 50 +++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 143 insertions(+), 11 deletions(-)
 create mode 100644 include/qemu/sys_membarrier.h
 create mode 100644 util/sys_membarrier.c

-- 
2.14.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/4] rcutorture: remove synchronize_rcu from readers
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
@ 2018-03-09 13:29 ` Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 2/4] rcu: make memory barriers more explicit Paolo Bonzini
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-09 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G . Cota

This gives much worse numbers for readers, especially if synchronize_rcu
is made more expensive as is the case with --enable-membarrier.  Before:

   $ tests/rcutorture 10 stress 10
   n_reads: 98304  n_updates: 529  n_mberror: 0
   rcu_stress_count: 98302 2 0 0 0 0 0 0 0 0 0

After:

   $ tests/rcutorture 10 stress 10
   n_reads: 165158482  n_updates: 429  n_mberror: 0
   rcu_stress_count: 165154364 4118 0 0 0 0 0 0 0 0 0

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/rcutorture.c | 4 ----
 1 file changed, 4 deletion(-)

diff --git a/tests/rcutorture.c b/tests/rcutorture.c
index 4002ecf123..2a7201549a 100644
--- a/tests/rcutorture.c
+++ b/tests/rcutorture.c
@@ -238,7 +238,6 @@ long long rcu_stress_count[RCU_STRESS_PIPE_LEN + 1];
 static void *rcu_read_stress_test(void *arg)
 {
     int i;
-    int itercnt = 0;
     struct rcu_stress *p;
     int pc;
     long long n_reads_local = 0;
@@ -269,9 +269,6 @@ static void *rcu_read_stress_test(void *arg)
         }
         rcu_stress_local[pc]++;
         n_reads_local++;
-        if ((++itercnt % 0x1000) == 0) {
-            synchronize_rcu();
-        }
     }
     qemu_mutex_lock(&counts_mutex);
     n_reads += n_reads_local;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/4] rcu: make memory barriers more explicit
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 1/4] rcutorture: remove synchronize_rcu from readers Paolo Bonzini
@ 2018-03-09 13:29 ` Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 3/4] membarrier: introduce qemu/sys_membarrier.h Paolo Bonzini
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-09 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G . Cota

Prepare for introducing smp_mb_placeholder() and smp_mb_global().
The new smp_mb() in synchronize_rcu() is not strictly necessary, since
the first atomic_mb_set for rcu_gp_ctr provides the required ordering.
However, synchronize_rcu is not performance critical, and it *will* be
necessary to introduce a smp_mb_global before calling wait_for_readers().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/rcu.h | 15 +++++++++++++--
 util/rcu.c         | 12 +++++++++---
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index f19413d649..625f09ac09 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -79,7 +79,10 @@ static inline void rcu_read_lock(void)
     }
 
     ctr = atomic_read(&rcu_gp_ctr);
-    atomic_xchg(&p_rcu_reader->ctr, ctr);
+    atomic_set(&p_rcu_reader->ctr, ctr);
+
+    /* Write p_rcu_reader->ctr before reading RCU-protected pointers.  */
+    smp_mb();
 }
 
 static inline void rcu_read_unlock(void)
@@ -91,7 +94,15 @@ static inline void rcu_read_unlock(void)
         return;
     }
 
-    atomic_xchg(&p_rcu_reader->ctr, 0);
+    /* Ensure that the critical section is seen to precede the
+     * store to p_rcu_reader->ctr.  Together with the following
+     * smp_mb(), this ensures writes to p_rcu_reader->ctr
+     * are sequentially consistent.
+     */
+    atomic_store_release(&p_rcu_reader->ctr, 0);
+
+    /* Write p_rcu_reader->ctr before reading p_rcu_reader->waiting.  */
+    smp_mb();
     if (unlikely(atomic_read(&p_rcu_reader->waiting))) {
         atomic_set(&p_rcu_reader->waiting, false);
         qemu_event_set(&rcu_gp_event);
diff --git a/util/rcu.c b/util/rcu.c
index f4d09c8304..7366dc50dd 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -92,8 +92,9 @@ static void wait_for_readers(void)
             atomic_set(&index->waiting, true);
         }
 
-        /* Here, order the stores to index->waiting before the
-         * loads of index->ctr.
+        /* Here, order the stores to index->waiting before the loads of
+         * index->ctr.  Pairs with smp_mb() in rcu_read_unlock(),
+         * ensuring that the loads of index->ctr are sequentially consistent.
          */
         smp_mb();
 
@@ -142,8 +143,13 @@ static void wait_for_readers(void)
 void synchronize_rcu(void)
 {
     qemu_mutex_lock(&rcu_sync_lock);
-    qemu_mutex_lock(&rcu_registry_lock);
 
+    /* Write RCU-protected pointers before reading p_rcu_reader->ctr.
+     * Pairs with smp_mb() in rcu_read_lock().
+     */
+    smp_mb();
+
+    qemu_mutex_lock(&rcu_registry_lock);
     if (!QLIST_EMPTY(&registry)) {
         /* In either case, the atomic_mb_set below blocks stores that free
          * old RCU-protected pointers.
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/4] membarrier: introduce qemu/sys_membarrier.h
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 1/4] rcutorture: remove synchronize_rcu from readers Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 2/4] rcu: make memory barriers more explicit Paolo Bonzini
@ 2018-03-09 13:29 ` Paolo Bonzini
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier Paolo Bonzini
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-09 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G . Cota

This new header file provides heavy-weight "global" memory barriers that
enforce memory ordering on each running thread belonging to the current
process.  For now, use a dummy implementation that issues memory barriers
on both sides (matching what QEMU has been doing so far).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/qemu/rcu.h            |  7 ++++---
 include/qemu/sys_membarrier.h | 17 +++++++++++++++++
 util/rcu.c                    |  9 +++++----
 3 files changed, 26 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/sys_membarrier.h

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 625f09ac09..22876d1428 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -27,6 +27,7 @@
 #include "qemu/thread.h"
 #include "qemu/queue.h"
 #include "qemu/atomic.h"
+#include "qemu/sys_membarrier.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -82,7 +83,7 @@ static inline void rcu_read_lock(void)
     atomic_set(&p_rcu_reader->ctr, ctr);
 
     /* Write p_rcu_reader->ctr before reading RCU-protected pointers.  */
-    smp_mb();
+    smp_mb_placeholder();
 }
 
 static inline void rcu_read_unlock(void)
@@ -96,13 +97,13 @@ static inline void rcu_read_unlock(void)
 
     /* Ensure that the critical section is seen to precede the
      * store to p_rcu_reader->ctr.  Together with the following
-     * smp_mb(), this ensures writes to p_rcu_reader->ctr
+     * smp_mb_placeholder(), this ensures writes to p_rcu_reader->ctr
      * are sequentially consistent.
      */
     atomic_store_release(&p_rcu_reader->ctr, 0);
 
     /* Write p_rcu_reader->ctr before reading p_rcu_reader->waiting.  */
-    smp_mb();
+    smp_mb_placeholder();
     if (unlikely(atomic_read(&p_rcu_reader->waiting))) {
         atomic_set(&p_rcu_reader->waiting, false);
         qemu_event_set(&rcu_gp_event);
diff --git a/include/qemu/sys_membarrier.h b/include/qemu/sys_membarrier.h
new file mode 100644
index 0000000000..9ce7f5210b
--- /dev/null
+++ b/include/qemu/sys_membarrier.h
@@ -0,0 +1,17 @@
+/*
+ * Process-global memory barriers
+ *
+ * Copyright (c) 2018 Red Hat, Inc.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ */
+
+#ifndef QEMU_SYS_MEMBARRIER_H
+#define QEMU_SYS_MEMBARRIER_H 1
+
+/* Keep it simple, execute a real memory barrier on both sides.  */
+static inline void smp_mb_global_init(void) {}
+#define smp_mb_global()            smp_mb()
+#define smp_mb_placeholder()       smp_mb()
+
+#endif
diff --git a/util/rcu.c b/util/rcu.c
index 7366dc50dd..5676c22bd1 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -93,10 +93,10 @@ static void wait_for_readers(void)
         }
 
         /* Here, order the stores to index->waiting before the loads of
-         * index->ctr.  Pairs with smp_mb() in rcu_read_unlock(),
+         * index->ctr.  Pairs with smp_mb_placeholder() in rcu_read_unlock(),
          * ensuring that the loads of index->ctr are sequentially consistent.
          */
-        smp_mb();
+        smp_mb_global();
 
         QLIST_FOREACH_SAFE(index, &registry, node, tmp) {
             if (!rcu_gp_ongoing(&index->ctr)) {
@@ -145,9 +145,9 @@ void synchronize_rcu(void)
     qemu_mutex_lock(&rcu_sync_lock);
 
     /* Write RCU-protected pointers before reading p_rcu_reader->ctr.
-     * Pairs with smp_mb() in rcu_read_lock().
+     * Pairs with smp_mb_placeholder() in rcu_read_lock().
      */
-    smp_mb();
+    smp_mb_global();
 
     qemu_mutex_lock(&rcu_registry_lock);
     if (!QLIST_EMPTY(&registry)) {
@@ -376,6 +376,7 @@ static void rcu_init_child(void)
 
 static void __attribute__((__constructor__)) rcu_init(void)
 {
+    smp_mb_global_init();
 #ifdef CONFIG_POSIX
     pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_child);
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
                   ` (2 preceding siblings ...)
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 3/4] membarrier: introduce qemu/sys_membarrier.h Paolo Bonzini
@ 2018-03-09 13:29 ` Paolo Bonzini
  2018-03-22  1:29   ` Emilio G. Cota
  2018-03-09 13:37 ` [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU no-reply
  2018-03-22  1:03 ` Emilio G. Cota
  5 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-09 13:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Emilio G . Cota

Actually enable the global memory barriers if supported by the OS.
Because only recent versions of Linux include the support, they
are disabled by default.  Note that it also has to be disabled
for QEMU to run under Wine.

Before this patch, rcutorture reports 85 ns/read for my machine,
after the patch it reports 12.5 ns/read.  On the other hand updates
go from 50 *micro*seconds to 20 *milli*seconds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure                     | 41 ++++++++++++++++++++++++++++++++++-
 include/qemu/sys_membarrier.h | 10 +++++++++
 util/Makefile.objs            |  1 +
 util/sys_membarrier.c         | 50 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 util/sys_membarrier.c

diff --git a/configure b/configure
index 6f3921c02a..4fcd18f136 100755
--- a/configure
+++ b/configure
@@ -342,7 +342,7 @@ attr=""
 libattr=""
 xfs=""
 tcg="yes"
-
+membarrier=""
 vhost_net="no"
 vhost_crypto="no"
 vhost_scsi="no"
@@ -1161,6 +1161,10 @@ for opt do
   ;;
   --enable-attr) attr="yes"
   ;;
+  --disable-membarrier) membarrier="no"
+  ;;
+  --enable-membarrier) membarrier="yes"
+  ;;
   --disable-blobs) blobs="no"
   ;;
   --with-pkgversion=*) pkgversion=" ($optarg)"
@@ -1577,6 +1581,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   xen-pci-passthrough
   brlapi          BrlAPI (Braile)
   curl            curl connectivity
+  membarrier      membarrier system call (for Linux 4.14+ or Windows)
   fdt             fdt device tree
   bluez           bluez stack connectivity
   kvm             KVM acceleration support
@@ -5137,6 +5142,35 @@ if compile_prog "" "" ; then
     have_fsxattr=yes
 fi
 
+##########################################
+# check for usable membarrier system call
+if test "$membarrier" = "yes"; then
+    have_membarrier=no
+    if test "$mingw32" = "yes" ; then
+        have_membarrier=yes
+    elif test "$linux" = "yes" ; then
+        cat > $TMPC << EOF
+    #include <linux/membarrier.h>
+    #include <sys/syscall.h>
+    #include <unistd.h>
+    int main(void) {
+        syscall(__NR_membarrier, MEMBARRIER_CMD_QUERY, 0);
+        syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0);
+    }
+EOF
+        if compile_prog "" "" ; then
+            have_membarrier=yes
+        fi
+    fi
+    if test "$have_membarrier" = "no"; then
+      feature_not_found "membarrier" "membarrier system call not available"
+    fi
+else
+    # Do not enable it by default even for Mingw32, because it doesn't
+    # work on Wine.
+    membarrier=no
+fi
+
 ##########################################
 # check if rtnetlink.h exists and is useful
 have_rtnetlink=no
@@ -5763,6 +5798,7 @@ fi
 echo "malloc trim support $malloc_trim"
 echo "RDMA support      $rdma"
 echo "fdt support       $fdt"
+echo "membarrier        $membarrier"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
 echo "madvise           $madvise"
@@ -6245,6 +6281,9 @@ fi
 if test "$fdt" = "yes" ; then
   echo "CONFIG_FDT=y" >> $config_host_mak
 fi
+if test "$membarrier" = "yes" ; then
+  echo "CONFIG_MEMBARRIER=y" >> $config_host_mak
+fi
 if test "$signalfd" = "yes" ; then
   echo "CONFIG_SIGNALFD=y" >> $config_host_mak
 fi
diff --git a/include/qemu/sys_membarrier.h b/include/qemu/sys_membarrier.h
index 9ce7f5210b..316e3dc4a2 100644
--- a/include/qemu/sys_membarrier.h
+++ b/include/qemu/sys_membarrier.h
@@ -9,9 +9,19 @@
 #ifndef QEMU_SYS_MEMBARRIER_H
 #define QEMU_SYS_MEMBARRIER_H 1
 
+#ifdef CONFIG_MEMBARRIER
+/* Only block reordering at the compiler level in the performance-critical
+ * side.  The slow side forces processor-level ordering on all other cores
+ * through a system call.
+ */
+extern void smp_mb_global_init(void);
+extern void smp_mb_global(void);
+#define smp_mb_placeholder()       barrier()
+#else
 /* Keep it simple, execute a real memory barrier on both sides.  */
 static inline void smp_mb_global_init(void) {}
 #define smp_mb_global()            smp_mb()
 #define smp_mb_placeholder()       smp_mb()
+#endif
 
 #endif
diff --git a/util/Makefile.objs b/util/Makefile.objs
index ae90b9963d..728c3541db 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -33,6 +33,7 @@ util-obj-y += throttle.o
 util-obj-y += getauxval.o
 util-obj-y += readline.o
 util-obj-y += rcu.o
+util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
 util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
diff --git a/util/sys_membarrier.c b/util/sys_membarrier.c
new file mode 100644
index 0000000000..8dcb53e63e
--- /dev/null
+++ b/util/sys_membarrier.c
@@ -0,0 +1,50 @@
+/*
+ * Process-global memory barriers
+ *
+ * Copyright (c) 2018 Red Hat, Inc.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ */
+
+#include <qemu/osdep.h>
+#include <qemu/sys_membarrier.h>
+#include <qemu/error-report.h>
+
+#ifdef CONFIG_LINUX
+#include <linux/membarrier.h>
+#include <sys/syscall.h>
+
+static int
+membarrier(int cmd, int flags)
+{
+    return syscall(__NR_membarrier, cmd, flags);
+}
+#endif
+
+void smp_mb_global(void)
+{
+#if defined CONFIG_WIN32
+    FlushProcessWriteBuffers();
+#elif defined CONFIG_LINUX
+    membarrier(MEMBARRIER_CMD_SHARED, 0);
+#else
+#error --enable-membarrier is not supported on this operating system.
+#endif
+}
+
+void smp_mb_global_init(void)
+{
+#ifdef CONFIG_LINUX
+    int ret = membarrier(MEMBARRIER_CMD_QUERY, 0);
+    if (ret < 0) {
+        error_report("This QEMU binary requires the membarrier system call.");
+        error_report("Please upgrade your system to a newer version of Linux");
+        exit(1);
+    }
+    if (!(ret & MEMBARRIER_CMD_SHARED)) {
+        error_report("This QEMU binary requires MEMBARRIER_CMD_SHARED support.");
+        error_report("Please upgrade your system to a newer version of Linux");
+        exit(1);
+    }
+#endif
+}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
                   ` (3 preceding siblings ...)
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier Paolo Bonzini
@ 2018-03-09 13:37 ` no-reply
  2018-03-22  1:03 ` Emilio G. Cota
  5 siblings, 0 replies; 9+ messages in thread
From: no-reply @ 2018-03-09 13:37 UTC (permalink / raw)
  To: pbonzini; +Cc: famz, qemu-devel, cota

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180309132922.24211-1-pbonzini@redhat.com
Subject: [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20180309132922.24211-1-pbonzini@redhat.com -> patchew/20180309132922.24211-1-pbonzini@redhat.com
Switched to a new branch 'test'
2912d721ce membarrier: add --enable-membarrier
5f0f5753cc membarrier: introduce qemu/sys_membarrier.h
8acb0ecf90 rcu: make memory barriers more explicit
7c506b9d5d rcutorture: remove synchronize_rcu from readers

=== OUTPUT BEGIN ===
Checking PATCH 1/4: rcutorture: remove synchronize_rcu from readers...
Checking PATCH 2/4: rcu: make memory barriers more explicit...
Checking PATCH 3/4: membarrier: introduce qemu/sys_membarrier.h...
ERROR: memory barrier without comment
#70: FILE: include/qemu/sys_membarrier.h:14:
+#define smp_mb_global()            smp_mb()

ERROR: memory barrier without comment
#71: FILE: include/qemu/sys_membarrier.h:15:
+#define smp_mb_placeholder()       smp_mb()

total: 2 errors, 0 warnings, 77 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/4: membarrier: add --enable-membarrier...
WARNING: line over 80 characters
#190: FILE: util/sys_membarrier.c:45:
+        error_report("This QEMU binary requires MEMBARRIER_CMD_SHARED support.");

total: 0 errors, 1 warnings, 152 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU
  2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
                   ` (4 preceding siblings ...)
  2018-03-09 13:37 ` [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU no-reply
@ 2018-03-22  1:03 ` Emilio G. Cota
  5 siblings, 0 replies; 9+ messages in thread
From: Emilio G. Cota @ 2018-03-22  1:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Fri, Mar 09, 2018 at 14:29:18 +0100, Paolo Bonzini wrote:
> I didn't have time to check whether it improves performance (not
> too likely, since QEMU tends to have pretty long RCU critical
> sections), but it cannot hurt either. :)

Sorry it took me more than a week to go through this, I was busy
working on the FP patchset.

I see this set is already on master; nonetheless here are some
minor comments.

		E.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier
  2018-03-09 13:29 ` [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier Paolo Bonzini
@ 2018-03-22  1:29   ` Emilio G. Cota
  2018-03-22  8:57     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Emilio G. Cota @ 2018-03-22  1:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Fri, Mar 09, 2018 at 14:29:22 +0100, Paolo Bonzini wrote:
> Actually enable the global memory barriers if supported by the OS.
> Because only recent versions of Linux include the support, they
> are disabled by default.  Note that it also has to be disabled
> for QEMU to run under Wine.
> 
> Before this patch, rcutorture reports 85 ns/read for my machine,
> after the patch it reports 12.5 ns/read.  On the other hand updates
> go from 50 *micro*seconds to 20 *milli*seconds.

It is indeed hard to see a large impact on performance given the
large size of our critical sections. But hey, rcu_read_unlock
goes down from 0.24% to 0.08% of execution time when booting
aarch64 linux!

As we remove bottlenecks though we should be able to gain
more benefits from this, at least in MTTCG where vcpu threads
exit the execution loop quite often.

I did some tests on qht-bench, moving the rcu_read_lock/unlock
pair to wrap each lookup instead of wrapping the entire test.
The results are great; without membarrier lookup throughput
goes down by half; with it, throughput only goes down by 5%.

(snip)
> +##########################################
> +# check for usable membarrier system call
> +if test "$membarrier" = "yes"; then
> +    have_membarrier=no
> +    if test "$mingw32" = "yes" ; then
> +        have_membarrier=yes
> +    elif test "$linux" = "yes" ; then
> +        cat > $TMPC << EOF
> +    #include <linux/membarrier.h>
> +    #include <sys/syscall.h>
> +    #include <unistd.h>
> +    int main(void) {
> +        syscall(__NR_membarrier, MEMBARRIER_CMD_QUERY, 0);
> +        syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0);
> +    }

I think we should also check here that MEMBARRIER_CMD_SHARED is
actually supported; it is possible for a kernel to have
the system call yet not support it (e.g. when the
kernel is compiled with nohz_full). Instead of failing at run-time
(in smp_mb_global_init) we should perhaps bark at configure time
as well.

Other than that the patches look good. Thanks for doing this!

		Emilio

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier
  2018-03-22  1:29   ` Emilio G. Cota
@ 2018-03-22  8:57     ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2018-03-22  8:57 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 22/03/2018 02:29, Emilio G. Cota wrote:
> It is indeed hard to see a large impact on performance given the
> large size of our critical sections. But hey, rcu_read_unlock
> goes down from 0.24% to 0.08% of execution time when booting
> aarch64 linux!

I expect something a little better than 0.15% from virtio, especially
once I re-enable MemoryRegionCache in 2.13.

>> +    int main(void) {
>> +        syscall(__NR_membarrier, MEMBARRIER_CMD_QUERY, 0);
>> +        syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0);
>> +    }
> 
> I think we should also check here that MEMBARRIER_CMD_SHARED is
> actually supported;

Checking run-time constraints at compile-time is a bit pointless.  My
idea was to add a third mode where the choice is done at run-time, but
that wouldn't have made 2.12.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-22  8:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 13:29 [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU Paolo Bonzini
2018-03-09 13:29 ` [Qemu-devel] [PATCH 1/4] rcutorture: remove synchronize_rcu from readers Paolo Bonzini
2018-03-09 13:29 ` [Qemu-devel] [PATCH 2/4] rcu: make memory barriers more explicit Paolo Bonzini
2018-03-09 13:29 ` [Qemu-devel] [PATCH 3/4] membarrier: introduce qemu/sys_membarrier.h Paolo Bonzini
2018-03-09 13:29 ` [Qemu-devel] [PATCH 4/4] membarrier: add --enable-membarrier Paolo Bonzini
2018-03-22  1:29   ` Emilio G. Cota
2018-03-22  8:57     ` Paolo Bonzini
2018-03-09 13:37 ` [Qemu-devel] [PATCH 0/4] Optionally use membarrier system call for RCU no-reply
2018-03-22  1:03 ` Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.