Re: [PATCH] x86 rwsem optimization extreme

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Zachary Amsden <zamsden@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH] x86 rwsem optimization extreme
Date: Wed, 17 Feb 2010 14:10:28 -0800 (PST)	[thread overview]
Message-ID: <alpine.LFD.2.00.1002171403160.4141@localhost.localdomain> (raw)
In-Reply-To: <1266443901-3646-1-git-send-email-zamsden@redhat.com>

On Wed, 17 Feb 2010, Zachary Amsden wrote:
>
> The x86 instruction set provides the ability to add an additional
> bit into addition or subtraction by using the carry flag.
> It also provides instructions to directly set or clear the
> carry flag.  By forcibly setting the carry flag, we can then
> represent one particular 64-bit constant, namely
> 
>    0xffffffff + 1 = 0x100000000
> 
> using only 32-bit values.  In particular we can optimize the rwsem
> write lock release by noting it is of exactly this form.

Don't do this.

Just shift the constants down by two, and suddenly you don't need any 
clever tricks, because all the constants fit in 32 bits anyway, 
regardless of sign issues.

So just change the 

	# define RWSEM_ACTIVE_MASK              0xffffffffL

line into

	# define RWSEM_ACTIVE_MASK              0x3fffffffL

and you're done.

The cost of 'adc' may happen to be identical in this case, but I suspect 
you didn't test on UP, where the 'lock' prefix goes away. An unlocked 
'add' tends to be faster than an unlocked 'adc'.

(It's possible that some micro-architectures don't care, since it's a 
memory op, and they can see that 'C' is set. But it's a fragile assumption 
that it would always be ok).

			Linus