All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] atomics: fix RCU perf. regression + update documentation
@ 2016-05-24 20:06 Emilio G. Cota
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux Emilio G. Cota
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-24 20:06 UTC (permalink / raw)
  To: QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Paolo Bonzini, Richard Henderson, Sergey Fedorov

v1: https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03661.html

Patch 1 hasn't changed from v1 (where it was patch 2 though).

Patches 2 and 3 fix a not-so-small-after-all RCU performance regression
we introduced when transitioning to __atomic primitives. I got
an arm64 machine to test today and a workload that issues a lot of
atomic_read_rcu's, such as a 100%-lookup qht-bench test,
can gain ~12% in performance. [ in v1's 0/2 message I mentioned rcutorture;
It turns out it's not as dependent on atomic_read_rcu as I thought,
so it's not a good benchmark to measure this effect. ]

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux
  2016-05-24 20:06 [Qemu-devel] [PATCH v2 0/3] atomics: fix RCU perf. regression + update documentation Emilio G. Cota
@ 2016-05-24 20:06 ` Emilio G. Cota
  2016-05-25 12:13   ` Paolo Bonzini
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer Emilio G. Cota
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read Emilio G. Cota
  2 siblings, 1 reply; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-24 20:06 UTC (permalink / raw)
  To: QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Paolo Bonzini, Richard Henderson, Sergey Fedorov

Recently Linux did a mass conversion of its atomic_read/set calls
so that they at least are READ/WRITE_ONCE. See Linux's commit
62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
that their documentation hasn't been updated to reflect this.

The appended updates our documentation to reflect the change, which
means there is effectively no difference between our atomic_read/set
and the current Linux implementation.

While at it, fix the statement that a barrier is implied by
atomic_read/set, which is incorrect. Volatile/atomic semantics prevent
transformations pertaining the variable they apply to; this, however,
has no effect on surrounding statements like barriers do. For more
details on this, see:
  https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 docs/atomics.txt | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/docs/atomics.txt b/docs/atomics.txt
index ef285e3..7540990 100644
--- a/docs/atomics.txt
+++ b/docs/atomics.txt
@@ -326,9 +326,19 @@ and memory barriers, and the equivalents in QEMU:
   use a boxed atomic_t type; atomic operations in QEMU are polymorphic
   and use normal C types.
 
-- atomic_read and atomic_set in Linux give no guarantee at all;
-  atomic_read and atomic_set in QEMU include a compiler barrier
-  (similar to the ACCESS_ONCE macro in Linux).
+- Originally, atomic_read and atomic_set in Linux gave no guarantee
+  at all. Recently they have been updated to implement volatile
+  semantics via ACCESS_ONCE (or the more recent READ/WRITE_ONCE).
+
+  QEMU's atomic_read/set implement, if the compiler supports it, C11
+  atomic relaxed semantics, and volatile semantics otherwise.
+  Both semantics prevent the compiler from doing certain transformations;
+  the difference is that atomic accesses are guaranteed to be atomic,
+  while volatile accesses aren't. Thus, in the volatile case we just cross
+  our fingers hoping that the compiler will generate atomic accesses,
+  since we assume the variables passed are machine-word sized and
+  properly aligned.
+  No barriers are implied by atomic_read/set in either Linux or QEMU.
 
 - most atomic read-modify-write operations in Linux return void;
   in QEMU, all of them return the old value of the variable.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer
  2016-05-24 20:06 [Qemu-devel] [PATCH v2 0/3] atomics: fix RCU perf. regression + update documentation Emilio G. Cota
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux Emilio G. Cota
@ 2016-05-24 20:06 ` Emilio G. Cota
  2016-05-24 20:09   ` Sergey Fedorov
  2016-05-25 12:16   ` Paolo Bonzini
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read Emilio G. Cota
  2 siblings, 2 replies; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-24 20:06 UTC (permalink / raw)
  To: QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Paolo Bonzini, Richard Henderson, Sergey Fedorov

For correctness, smp_read_barrier_depends() is only required to
emit a barrier on Sparc hosts. However, we are currently emitting
a consume fence unconditionally.

Fix it by keeping the consume fence if we're compiling with Thread
Sanitizer, since this might help prevent false warnings. Otherwise,
only emit the barrier for Sparc hosts. Note that we still guarantee
that smp_read_barrier_depends() is a compiler barrier.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/atomic.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 5bc4d6c..4a4f2fb 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -36,7 +36,14 @@
 #define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
 #define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
 
+#if defined(__SANITIZE_THREAD__)
 #define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
+#elsif defined(__alpha__)
+#define smp_read_barrier_depends()   asm volatile("mb":::"memory")
+#else
+#define smp_read_barrier_depends()   barrier()
+#endif
+
 
 /* Weak atomic operations prevent the compiler moving other
  * loads/stores past the atomic operation load/store. However there is
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read
  2016-05-24 20:06 [Qemu-devel] [PATCH v2 0/3] atomics: fix RCU perf. regression + update documentation Emilio G. Cota
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux Emilio G. Cota
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer Emilio G. Cota
@ 2016-05-24 20:06 ` Emilio G. Cota
  2016-05-25 12:20   ` Paolo Bonzini
  2 siblings, 1 reply; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-24 20:06 UTC (permalink / raw)
  To: QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Paolo Bonzini, Richard Henderson, Sergey Fedorov

Currently we emit a consume-load in atomic_rcu_read. This is
overkill for non-Sparc hosts, and is only useful to make
things easier for Thread Sanitizer, which as far as I understand
works best without explicit fences.

The appended leaves the consume-load in atomic_rcu_read when
compiling with Thread Sanitizer enabled, and resorts to a
relaxed load + smp_read_barrier_depends otherwise.

On an RMO host architecture, such as aarch64, the performance
improvement of this change is easily measurable. For instance,
qht-bench performs an atomic_rcu_read on every lookup. Performance
before and after applying this patch:

$ tests/qht-bench -d 5 -n 1
Before: 9.78 MT/s
After:  10.96 MT/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/atomic.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 4a4f2fb..c5b6c8d 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -63,13 +63,20 @@
     __atomic_store(ptr, &_val, __ATOMIC_RELAXED);     \
 } while(0)
 
-/* Atomic RCU operations imply weak memory barriers */
+#ifdef __SANITIZE_THREAD__
+#define atomic_rcu_read__nocheck(ptr, valptr)           \
+    __atomic_load(ptr, valptr, __ATOMIC_CONSUME);
+#else
+#define atomic_rcu_read__nocheck(ptr, valptr)           \
+    __atomic_load(ptr, valptr, __ATOMIC_RELAXED);       \
+    smp_read_barrier_depends();
+#endif
 
 #define atomic_rcu_read(ptr)                          \
     ({                                                \
     QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
     typeof(*ptr) _val;                                \
-    __atomic_load(ptr, &_val, __ATOMIC_CONSUME);      \
+    atomic_rcu_read__nocheck(ptr, &_val);             \
     _val;                                             \
     })
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer Emilio G. Cota
@ 2016-05-24 20:09   ` Sergey Fedorov
  2016-05-24 20:44     ` Emilio G. Cota
  2016-05-25 12:16   ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Sergey Fedorov @ 2016-05-24 20:09 UTC (permalink / raw)
  To: Emilio G. Cota, QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Paolo Bonzini, Richard Henderson

On 24/05/16 23:06, Emilio G. Cota wrote:
> For correctness, smp_read_barrier_depends() is only required to
> emit a barrier on Sparc hosts. However, we are currently emitting
> a consume fence unconditionally.
>
> Fix it by keeping the consume fence if we're compiling with Thread
> Sanitizer, since this might help prevent false warnings. Otherwise,
> only emit the barrier for Sparc hosts. Note that we still guarantee
> that smp_read_barrier_depends() is a compiler barrier.

s/Sparc/Alpha/?

Regards,
Sergey

>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  include/qemu/atomic.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 5bc4d6c..4a4f2fb 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -36,7 +36,14 @@
>  #define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
>  #define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
>  
> +#if defined(__SANITIZE_THREAD__)
>  #define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
> +#elsif defined(__alpha__)
> +#define smp_read_barrier_depends()   asm volatile("mb":::"memory")
> +#else
> +#define smp_read_barrier_depends()   barrier()
> +#endif
> +
>  
>  /* Weak atomic operations prevent the compiler moving other
>   * loads/stores past the atomic operation load/store. However there is

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer
  2016-05-24 20:09   ` Sergey Fedorov
@ 2016-05-24 20:44     ` Emilio G. Cota
  0 siblings, 0 replies; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-24 20:44 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: QEMU Developers, MTTCG Devel, Alex Bennée, Paolo Bonzini,
	Richard Henderson

On Tue, May 24, 2016 at 23:09:47 +0300, Sergey Fedorov wrote:
> On 24/05/16 23:06, Emilio G. Cota wrote:
> > For correctness, smp_read_barrier_depends() is only required to
> > emit a barrier on Sparc hosts. However, we are currently emitting
> > a consume fence unconditionally.
> >
> > Fix it by keeping the consume fence if we're compiling with Thread
> > Sanitizer, since this might help prevent false warnings. Otherwise,
> > only emit the barrier for Sparc hosts. Note that we still guarantee
> > that smp_read_barrier_depends() is a compiler barrier.
> 
> s/Sparc/Alpha/?

Obviously :-) Typo also in patch 3 -- apologies.

		E.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux Emilio G. Cota
@ 2016-05-25 12:13   ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-05-25 12:13 UTC (permalink / raw)
  To: Emilio G. Cota, QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Richard Henderson, Sergey Fedorov



On 24/05/2016 22:06, Emilio G. Cota wrote:
> Recently Linux did a mass conversion of its atomic_read/set calls
> so that they at least are READ/WRITE_ONCE. See Linux's commit
> 62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
> that their documentation hasn't been updated to reflect this.

Let's mention explicitly that the change happened in Linux 4.1.  Queued
with this small change.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer Emilio G. Cota
  2016-05-24 20:09   ` Sergey Fedorov
@ 2016-05-25 12:16   ` Paolo Bonzini
  2016-05-25 15:06     ` Emilio G. Cota
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2016-05-25 12:16 UTC (permalink / raw)
  To: Emilio G. Cota, QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Richard Henderson, Sergey Fedorov



On 24/05/2016 22:06, Emilio G. Cota wrote:
> For correctness, smp_read_barrier_depends() is only required to
> emit a barrier on Sparc hosts. However, we are currently emitting
> a consume fence unconditionally.

Let's say why this is suboptimal:

... and most compilers currently treat consume and acquire fences as 
equivalent.

Likewise, let's add a comment like this:

+/* Most compilers currently treat consume and acquire the same, but really
+ * no processors except Alpha need a barrier here.  Leave it in if
+ * using Thread Sanitizer to avoid warnings, otherwise optimize it away.
+ */

If okay I can do the change myself.

Thanks,

Paolo

> 
> Fix it by keeping the consume fence if we're compiling with Thread
> Sanitizer, since this might help prevent false warnings. Otherwise,
> only emit the barrier for Sparc hosts. Note that we still guarantee
> that smp_read_barrier_depends() is a compiler barrier.
> 
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  include/qemu/atomic.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 5bc4d6c..4a4f2fb 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -36,7 +36,14 @@
>  #define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); barrier(); })
>  #define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); barrier(); })
>  
> +#if defined(__SANITIZE_THREAD__)
>  #define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })
> +#elsif defined(__alpha__)
> +#define smp_read_barrier_depends()   asm volatile("mb":::"memory")
> +#else
> +#define smp_read_barrier_depends()   barrier()
> +#endif
> +
>  
>  /* Weak atomic operations prevent the compiler moving other
>   * loads/stores past the atomic operation load/store. However there is
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read
  2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read Emilio G. Cota
@ 2016-05-25 12:20   ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2016-05-25 12:20 UTC (permalink / raw)
  To: Emilio G. Cota, QEMU Developers, MTTCG Devel
  Cc: Alex Bennée, Richard Henderson, Sergey Fedorov



On 24/05/2016 22:06, Emilio G. Cota wrote:
> Currently we emit a consume-load in atomic_rcu_read. This is
> overkill for non-Sparc hosts, and is only useful to make
> things easier for Thread Sanitizer, which as far as I understand
> works best without explicit fences.

Likewise:

Currently we emit a consume-load in atomic_rcu_read.  Because of
limitations in current compilers, this is overkill for non-Alpha hosts
and it is only useful to make Thread Sanitizer work.

and

+/* See above: most compilers currently treat consume and acquire the
+ * same, but this slows down atomic_rcu_read unnecessarily.
+ */

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer
  2016-05-25 12:16   ` Paolo Bonzini
@ 2016-05-25 15:06     ` Emilio G. Cota
  0 siblings, 0 replies; 10+ messages in thread
From: Emilio G. Cota @ 2016-05-25 15:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: QEMU Developers, MTTCG Devel, Alex Bennée,
	Richard Henderson, Sergey Fedorov

On Wed, May 25, 2016 at 14:16:56 +0200, Paolo Bonzini wrote:
> On 24/05/2016 22:06, Emilio G. Cota wrote:
> > For correctness, smp_read_barrier_depends() is only required to
> > emit a barrier on Sparc hosts. However, we are currently emitting
> > a consume fence unconditionally.
> 
> Let's say why this is suboptimal:
> 
> ... and most compilers currently treat consume and acquire fences as 
> equivalent.
> 
> Likewise, let's add a comment like this:
> 
> +/* Most compilers currently treat consume and acquire the same, but really
> + * no processors except Alpha need a barrier here.  Leave it in if
> + * using Thread Sanitizer to avoid warnings, otherwise optimize it away.
> + */
> 
> If okay I can do the change myself.

On Wed, May 25, 2016 at 14:20:02 +0200, Paolo Bonzini wrote:
> On 24/05/2016 22:06, Emilio G. Cota wrote:
> > Currently we emit a consume-load in atomic_rcu_read. This is
> > overkill for non-Sparc hosts, and is only useful to make
> > things easier for Thread Sanitizer, which as far as I understand
> > works best without explicit fences.
> 
> Likewise:
> 
> Currently we emit a consume-load in atomic_rcu_read.  Because of
> limitations in current compilers, this is overkill for non-Alpha hosts
> and it is only useful to make Thread Sanitizer work.
> 
> and
> 
> +/* See above: most compilers currently treat consume and acquire the
> + * same, but this slows down atomic_rcu_read unnecessarily.
> + */

Please go ahead with these changes. Don't forget to s/Sparc/Alpha/ on the
commit messages! There are 3 bogus Sparc's in the commit log of
(my) patch 2/3, including the commit title.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-25 15:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-24 20:06 [Qemu-devel] [PATCH v2 0/3] atomics: fix RCU perf. regression + update documentation Emilio G. Cota
2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 1/3] docs/atomics: update atomic_read/set comparison with Linux Emilio G. Cota
2016-05-25 12:13   ` Paolo Bonzini
2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 2/3] atomics: emit an smp_read_barrier_depends() barrier only for Sparc and Thread Sanitizer Emilio G. Cota
2016-05-24 20:09   ` Sergey Fedorov
2016-05-24 20:44     ` Emilio G. Cota
2016-05-25 12:16   ` Paolo Bonzini
2016-05-25 15:06     ` Emilio G. Cota
2016-05-24 20:06 ` [Qemu-devel] [PATCH v2 3/3] atomics: do not emit consume barrier for atomic_rcu_read Emilio G. Cota
2016-05-25 12:20   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.