* [PATCH] rwlock: remove unneeded subtraction
@ 2022-02-15 9:39 Roger Pau Monne
2022-02-15 10:37 ` Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Roger Pau Monne @ 2022-02-15 9:39 UTC (permalink / raw)
To: xen-devel
Cc: Roger Pau Monne, Andrew Cooper, George Dunlap, Jan Beulich,
Julien Grall, Stefano Stabellini, Wei Liu
There's no need to subtract _QR_BIAS from the lock value for storing
in the local cnts variable in the read lock slow path: the users of
the value in cnts only care about the writer-related bits and use a
mask to get the value.
Note that further setting of cnts in rspin_until_writer_unlock already
do not subtract _QR_BIAS.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/common/rwlock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index dadab372b5..aa15529bbe 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -47,7 +47,7 @@ void queue_read_lock_slowpath(rwlock_t *lock)
while ( atomic_read(&lock->cnts) & _QW_WMASK )
cpu_relax();
- cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+ cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
rspin_until_writer_unlock(lock, cnts);
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 9:39 [PATCH] rwlock: remove unneeded subtraction Roger Pau Monne
@ 2022-02-15 10:37 ` Jan Beulich
2022-02-15 11:54 ` Luca Fancellu
2022-02-15 13:13 ` Julien Grall
2 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2022-02-15 10:37 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
Wei Liu, xen-devel
On 15.02.2022 10:39, Roger Pau Monne wrote:
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
>
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 9:39 [PATCH] rwlock: remove unneeded subtraction Roger Pau Monne
2022-02-15 10:37 ` Jan Beulich
@ 2022-02-15 11:54 ` Luca Fancellu
2022-02-15 13:13 ` Julien Grall
2 siblings, 0 replies; 8+ messages in thread
From: Luca Fancellu @ 2022-02-15 11:54 UTC (permalink / raw)
To: Roger Pau Monne
Cc: Xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
Julien Grall, Stefano Stabellini, Wei Liu
> On 15 Feb 2022, at 09:39, Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
>
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
> ---
> xen/common/rwlock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
> index dadab372b5..aa15529bbe 100644
> --- a/xen/common/rwlock.c
> +++ b/xen/common/rwlock.c
> @@ -47,7 +47,7 @@ void queue_read_lock_slowpath(rwlock_t *lock)
> while ( atomic_read(&lock->cnts) & _QW_WMASK )
> cpu_relax();
>
> - cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> + cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
> rspin_until_writer_unlock(lock, cnts);
>
> /*
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 9:39 [PATCH] rwlock: remove unneeded subtraction Roger Pau Monne
2022-02-15 10:37 ` Jan Beulich
2022-02-15 11:54 ` Luca Fancellu
@ 2022-02-15 13:13 ` Julien Grall
2022-02-15 13:22 ` Jan Beulich
2 siblings, 1 reply; 8+ messages in thread
From: Julien Grall @ 2022-02-15 13:13 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel
Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu
Hi,
On 15/02/2022 09:39, Roger Pau Monne wrote:
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
>
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
The rwlock is a copy of the Linux implementation. So I looked at the
history to find out why _QR_BIAS was substracted.
It looks like this was done to get better assembly on x86:
commit f9852b74bec0117b888da39d070c323ea1cb7f4c
Author: Peter Zijlstra <peterz@infradead.org>
Date: Mon Apr 18 01:27:03 2016 +0200
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
The only reason for the current code is to make GCC emit only the
"LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
the result), do so nicer.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <waiman.long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index fec082338668..19248ddf37ce 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock,
u32 cnts)
* that accesses can't leak upwards out of our subsequent critical
* section in the case that the lock is currently held for write.
*/
- cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+ cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
rspin_until_writer_unlock(lock, cnts);
/*
This is a slowpath, so probably not a concern. But I thought I would
double check whether the x86 folks are still happy to proceed with that
in mind.
Cheers,
--
Julien Grall
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 13:13 ` Julien Grall
@ 2022-02-15 13:22 ` Jan Beulich
2022-02-15 14:16 ` Roger Pau Monné
0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-02-15 13:22 UTC (permalink / raw)
To: Julien Grall, Roger Pau Monne
Cc: Andrew Cooper, George Dunlap, Stefano Stabellini, Wei Liu, xen-devel
On 15.02.2022 14:13, Julien Grall wrote:
> On 15/02/2022 09:39, Roger Pau Monne wrote:
>> There's no need to subtract _QR_BIAS from the lock value for storing
>> in the local cnts variable in the read lock slow path: the users of
>> the value in cnts only care about the writer-related bits and use a
>> mask to get the value.
>>
>> Note that further setting of cnts in rspin_until_writer_unlock already
>> do not subtract _QR_BIAS.
>
> The rwlock is a copy of the Linux implementation. So I looked at the
> history to find out why _QR_BIAS was substracted.
>
> It looks like this was done to get better assembly on x86:
>
> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> Author: Peter Zijlstra <peterz@infradead.org>
> Date: Mon Apr 18 01:27:03 2016 +0200
>
> locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
>
> The only reason for the current code is to make GCC emit only the
> "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
> the result), do so nicer.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Waiman Long <waiman.long@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index fec082338668..19248ddf37ce 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock,
> u32 cnts)
> * that accesses can't leak upwards out of our subsequent critical
> * section in the case that the lock is currently held for write.
> */
> - cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> + cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
> rspin_until_writer_unlock(lock, cnts);
>
> /*
>
> This is a slowpath, so probably not a concern. But I thought I would
> double check whether the x86 folks are still happy to proceed with that
> in mind.
Hmm, that's an interesting observation. Roger - did you inspect the
generated code? At the very least the description may want amending.
Jan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 13:22 ` Jan Beulich
@ 2022-02-15 14:16 ` Roger Pau Monné
2022-02-15 14:45 ` Jan Beulich
0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-02-15 14:16 UTC (permalink / raw)
To: Jan Beulich
Cc: Julien Grall, Andrew Cooper, George Dunlap, Stefano Stabellini,
Wei Liu, xen-devel
On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
> On 15.02.2022 14:13, Julien Grall wrote:
> > On 15/02/2022 09:39, Roger Pau Monne wrote:
> >> There's no need to subtract _QR_BIAS from the lock value for storing
> >> in the local cnts variable in the read lock slow path: the users of
> >> the value in cnts only care about the writer-related bits and use a
> >> mask to get the value.
> >>
> >> Note that further setting of cnts in rspin_until_writer_unlock already
> >> do not subtract _QR_BIAS.
> >
> > The rwlock is a copy of the Linux implementation. So I looked at the
> > history to find out why _QR_BIAS was substracted.
> >
> > It looks like this was done to get better assembly on x86:
> >
> > commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date: Mon Apr 18 01:27:03 2016 +0200
> >
> > locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> >
> > The only reason for the current code is to make GCC emit only the
> > "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
> > the result), do so nicer.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Acked-by: Waiman Long <waiman.long@hpe.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: linux-arch@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> >
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index fec082338668..19248ddf37ce 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock,
> > u32 cnts)
> > * that accesses can't leak upwards out of our subsequent critical
> > * section in the case that the lock is currently held for write.
> > */
> > - cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> > + cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
> > rspin_until_writer_unlock(lock, cnts);
> >
> > /*
> >
> > This is a slowpath, so probably not a concern. But I thought I would
> > double check whether the x86 folks are still happy to proceed with that
> > in mind.
>
> Hmm, that's an interesting observation. Roger - did you inspect the
> generated code? At the very least the description may want amending.
It seems to always generate the same code for me when using gcc 8.3,
I even tried using arch_fetch_and_add directly, it always results
in:
ffff82d04022d983: f0 0f c1 03 lock xadd %eax,(%rbx)
ffff82d04022d987: 25 00 30 00 00 and $0x3000,%eax
Similarly clang 13.0.0 seem to always generate:
ffff82d0402085de: f0 0f c1 03 lock xadd %eax,(%rbx)
ffff82d0402085e2: 89 c1 mov %eax,%ecx
ffff82d0402085e4: 81 e1 00 30 00 00 and $0x3000,%ecx
Maybe I'm missing something, but I don't see a difference in the
generated code.
I could add to the commit message:
"Originally _QR_BIAS was subtracted from the result of
atomic_add_return_acquire in order to prevent GCC from emitting an
unneeded ADD instruction. This being in the lock slow path such
optimizations don't seem likely to make any relevant performance
difference. Also modern GCC and CLANG versions will already avoid
emitting the ADD instruction."
Thanks, Roger.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 14:16 ` Roger Pau Monné
@ 2022-02-15 14:45 ` Jan Beulich
2022-02-15 15:40 ` Roger Pau Monné
0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-02-15 14:45 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Julien Grall, Andrew Cooper, George Dunlap, Stefano Stabellini,
Wei Liu, xen-devel
On 15.02.2022 15:16, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
>> On 15.02.2022 14:13, Julien Grall wrote:
>>> On 15/02/2022 09:39, Roger Pau Monne wrote:
>>>> There's no need to subtract _QR_BIAS from the lock value for storing
>>>> in the local cnts variable in the read lock slow path: the users of
>>>> the value in cnts only care about the writer-related bits and use a
>>>> mask to get the value.
>>>>
>>>> Note that further setting of cnts in rspin_until_writer_unlock already
>>>> do not subtract _QR_BIAS.
>>>
>>> The rwlock is a copy of the Linux implementation. So I looked at the
>>> history to find out why _QR_BIAS was substracted.
>>>
>>> It looks like this was done to get better assembly on x86:
>>>
>>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
>>> Author: Peter Zijlstra <peterz@infradead.org>
>>> Date: Mon Apr 18 01:27:03 2016 +0200
>>>
>>> locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
>>>
>>> The only reason for the current code is to make GCC emit only the
>>> "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>>> the result), do so nicer.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Acked-by: Waiman Long <waiman.long@hpe.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: linux-arch@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>>
>>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>>> index fec082338668..19248ddf37ce 100644
>>> --- a/kernel/locking/qrwlock.c
>>> +++ b/kernel/locking/qrwlock.c
>>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock,
>>> u32 cnts)
>>> * that accesses can't leak upwards out of our subsequent critical
>>> * section in the case that the lock is currently held for write.
>>> */
>>> - cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
>>> + cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>>> rspin_until_writer_unlock(lock, cnts);
>>>
>>> /*
>>>
>>> This is a slowpath, so probably not a concern. But I thought I would
>>> double check whether the x86 folks are still happy to proceed with that
>>> in mind.
>>
>> Hmm, that's an interesting observation. Roger - did you inspect the
>> generated code? At the very least the description may want amending.
>
> It seems to always generate the same code for me when using gcc 8.3,
> I even tried using arch_fetch_and_add directly, it always results
> in:
>
> ffff82d04022d983: f0 0f c1 03 lock xadd %eax,(%rbx)
> ffff82d04022d987: 25 00 30 00 00 and $0x3000,%eax
>
> Similarly clang 13.0.0 seem to always generate:
>
> ffff82d0402085de: f0 0f c1 03 lock xadd %eax,(%rbx)
> ffff82d0402085e2: 89 c1 mov %eax,%ecx
> ffff82d0402085e4: 81 e1 00 30 00 00 and $0x3000,%ecx
>
> Maybe I'm missing something, but I don't see a difference in the
> generated code.
I've looked myself in the meantime, and I can largely confirm this.
Clang 5 doesn't eliminate the "add" (or really "lea") though. But
nevertheless ...
> I could add to the commit message:
>
> "Originally _QR_BIAS was subtracted from the result of
> atomic_add_return_acquire in order to prevent GCC from emitting an
> unneeded ADD instruction. This being in the lock slow path such
> optimizations don't seem likely to make any relevant performance
> difference. Also modern GCC and CLANG versions will already avoid
> emitting the ADD instruction."
... I'm fine with this as explanation; I'd also be fine adding
this to the description while committing.
Jan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock: remove unneeded subtraction
2022-02-15 14:45 ` Jan Beulich
@ 2022-02-15 15:40 ` Roger Pau Monné
0 siblings, 0 replies; 8+ messages in thread
From: Roger Pau Monné @ 2022-02-15 15:40 UTC (permalink / raw)
To: Jan Beulich
Cc: Julien Grall, Andrew Cooper, George Dunlap, Stefano Stabellini,
Wei Liu, xen-devel
On Tue, Feb 15, 2022 at 03:45:00PM +0100, Jan Beulich wrote:
> On 15.02.2022 15:16, Roger Pau Monné wrote:
> > I could add to the commit message:
> >
> > "Originally _QR_BIAS was subtracted from the result of
> > atomic_add_return_acquire in order to prevent GCC from emitting an
> > unneeded ADD instruction. This being in the lock slow path such
> > optimizations don't seem likely to make any relevant performance
> > difference. Also modern GCC and CLANG versions will already avoid
> > emitting the ADD instruction."
>
> ... I'm fine with this as explanation; I'd also be fine adding
> this to the description while committing.
Sure, thanks.
Roger.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-02-15 15:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-15 9:39 [PATCH] rwlock: remove unneeded subtraction Roger Pau Monne
2022-02-15 10:37 ` Jan Beulich
2022-02-15 11:54 ` Luca Fancellu
2022-02-15 13:13 ` Julien Grall
2022-02-15 13:22 ` Jan Beulich
2022-02-15 14:16 ` Roger Pau Monné
2022-02-15 14:45 ` Jan Beulich
2022-02-15 15:40 ` Roger Pau Monné
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.