From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50035) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4sCV-0007Sz-3K for qemu-devel@nongnu.org; Mon, 23 May 2016 11:55:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4sCQ-0008Ob-PQ for qemu-devel@nongnu.org; Mon, 23 May 2016 11:55:25 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:38898) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4sCO-0008I9-Fl for qemu-devel@nongnu.org; Mon, 23 May 2016 11:55:22 -0400 Date: Mon, 23 May 2016 11:55:10 -0400 From: "Emilio G. Cota" Message-ID: <20160523155510.GC1768@flamenco> References: <1463863336-28760-1-git-send-email-cota@braap.org> <1463863336-28760-2-git-send-email-cota@braap.org> <955e8307-01a5-b2f9-48df-8309bd30c443@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <955e8307-01a5-b2f9-48df-8309bd30c443@redhat.com> Subject: Re: [Qemu-devel] [PATCH 1/2] atomics: do not use __atomic primitives for RCU atomics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: QEMU Developers , MTTCG Devel , Alex =?iso-8859-1?Q?Benn=E9e?= , Richard Henderson , Sergey Fedorov On Mon, May 23, 2016 at 16:21:36 +0200, Paolo Bonzini wrote: > On 21/05/2016 22:42, Emilio G. Cota wrote: > > Commit a0aa44b4 ("include/qemu/atomic.h: default to __atomic functions") > > set all atomics to default (on recent GCC versions) to __atomic primitives. > > > > In the process, the atomic_rcu_read/set were converted to implement > > consume/release semantics, respectively. This is inefficient; for > > correctness and maximum performance we only need an smp_barrier_depends > > for reads, and an smp_wmb for writes. Fix it by using the original > > definition of these two primitives for all compilers. > > Indeed most compilers implement consume the same as acquire, which is > inefficient. > However, isn't in practice atomic_thread_fence(release) + > atomic_store(relaxed) the same as atomic_store(release)? Yes. However this is not the issue I'm addressing with the patch. The performance regression I measured is due to using load-acquire vs. load+smp_read_barrier_depends(). In the latter case only Alpha will emit a fence; in the former we always emit store-release, which is "stronger" (i.e. more constraining.) A similar thing applies to atomic_rcu_write, although I haven't measured its impact. We only need smp_wmb+store, yet we emit a store-release, which is again "stronger". E.