From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751099AbcFUJI4 (ORCPT ); Tue, 21 Jun 2016 05:08:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33888 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750696AbcFUJIw (ORCPT ); Tue, 21 Jun 2016 05:08:52 -0400 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <5e5b99d7-c739-9743-b3e0-fbe0636d6dee@zytor.com> References: <5e5b99d7-c739-9743-b3e0-fbe0636d6dee@zytor.com> <40fd5f74-190e-b805-fbaa-f84899190fbc@zytor.com> <20160615085002.GC30935@twins.programming.kicks-ass.net> To: "H. Peter Anvin" Cc: dhowells@redhat.com, Peter Zijlstra , Linux Kernel Mailing List , linux-arch , Linus Torvalds , Ingo Molnar , Thomas Gleixner Subject: Re: cmpxchg and x86 flags output MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <4466.1466499980.1@warthog.procyon.org.uk> Date: Tue, 21 Jun 2016 10:06:20 +0100 Message-ID: <4467.1466499980@warthog.procyon.org.uk> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 21 Jun 2016 09:06:22 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org H. Peter Anvin wrote: > Well, that sounds promising. I wonder how David's model, using > intrinsics (do we have enough intrinsics to actually be able to do this > "correctly"?), compare to using the flags output from assembly. There is an advantage to using the intriniscs on arches with explicit barriers. On powerpc64, for example, the compiler can move the release memory barrier earlier to push register-only instructions between the barrier and the lwarx. This would allow the memory barrier to be executed concurrently with those instructions. The compiler could also move the acquire memory barrier later, pulling register-only instructions between the stwcx and that barrier, though I don't see any advantage to doing so. Whereas if the release barrier is in the same asm block as the lwarx, the compiler cannot do anything with it. Another advantage is that the compiler can switch between instruction variants automatically, allowing us to get rid of the size-based switch statements for things like cmpxchg(). However, there's probably not a great deal of difference to be had if the inline asm codes the appropriate instruction in each case for something like x86*. The emitted code ought to look the same. The second biggest win for the intriniscs, I think, is the ability to ask the CMPXCHG instruction whether it actually did anything rather than comparing the result. I added two variants, one that only returned the yes/no and one that passed back the value as well as the yes/no. David