From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41070) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tM9-0005FK-H8 for qemu-devel@nongnu.org; Mon, 23 May 2016 13:09:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4tM5-0003Z6-6R for qemu-devel@nongnu.org; Mon, 23 May 2016 13:09:28 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:40662) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4tM2-0003Uj-Ql for qemu-devel@nongnu.org; Mon, 23 May 2016 13:09:25 -0400 Date: Mon, 23 May 2016 13:09:12 -0400 From: "Emilio G. Cota" Message-ID: <20160523170912.GA16390@flamenco> References: <1463863336-28760-1-git-send-email-cota@braap.org> <1463863336-28760-2-git-send-email-cota@braap.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH 1/2] atomics: do not use __atomic primitives for RCU atomics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: QEMU Developers , MTTCG Devel , Alex =?iso-8859-1?Q?Benn=E9e?= , Paolo Bonzini , Sergey Fedorov On Mon, May 23, 2016 at 09:53:00 -0700, Richard Henderson wrote: > On 05/21/2016 01:42 PM, Emilio G. Cota wrote: > >In the process, the atomic_rcu_read/set were converted to implement > >consume/release semantics, respectively. This is inefficient; for > >correctness and maximum performance we only need an smp_barrier_depends > >for reads, and an smp_wmb for writes. Fix it by using the original > >definition of these two primitives for all compilers. > > For what host do you think this is inefficient? > > In particular, what you've done is going to be less efficient for e.g. > armv8, where the __atomic formulation is going to produce load-acquire and > store-release instructions. Whereas the separate barriers are going to > produce two insns. > > As for the common case of x86_64, what you're doing is going to make no > difference at all. > > So what are you trying to improve? Precisely I tested this on ARMv8. The goal is to not emit a fence at all, i.e. to emit a single store instead of LDR (load-acquire). I just realised that under #ifdef __ATOMIC we have: #define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); }) Why? This should be: #ifdef __alpha__ #define smp_read_barrier_depends() asm volatile("mb":::"memory") #endif unconditionally. My patch should have included this additional change to make sense. Sorry for the confusion. E. PS. And really equating smp_wmb/rmb to release/acquire as we have under #ifdef __ATOMIC is hard to justify, other than to please tsan.