rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* This percpu_rwsem that always enters its reader slow path
@ 2019-07-18 21:09 Joel Fernandes
  2019-07-18 21:10 ` Joel Fernandes
  2019-07-19  7:50 ` Byungchul Park
  0 siblings, 2 replies; 4+ messages in thread
From: Joel Fernandes @ 2019-07-18 21:09 UTC (permalink / raw)
  To: rcu; +Cc: Byungchul Park

Hello friends,

Just providing an update on my debugging of percpu_rwsem (related to
rcu-sync) for the day! which I pinged Byungchul about. Please ignore
this email if you are busy :) I am just archiving it in here..

As you may know, percpu_rwsem uses rcu-sync framework to reduce cost
of read-side by making it free of any serializing/atomic instructions
at all. However, there was one sempahore which broke the rules!

I spent a couple hours trying to figure out why
cgroup_threadgroup_rwsem always entered the reader-slow path on my
system (RCU-sync turns out to be non-idle for this rwsem). I really
thought it was a bug, because I felt what's the pointed of rcu-sync if
it never goes idle..

Then I landed on the commit below, and turns it was done for Android
and reported by John :) And the patch author was a certain guy named
Peter :)

commit 3942a9bd7b5842a924e99ee6ec1350b8006c94ec
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Aug 11 18:54:13 2016 +0200

    locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
-----------

Basically, this commit makes the read-side cost percpu_rwsem slightly
more expensive (one smp_load_acquire of readers_block, at the cost of
making write-side a bit more expensive...)

So turns out it is weird, but it is certainly not a bug.

Learned something new but wasted my time a bit :)

Cheers, and see you later,
- Joel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: This percpu_rwsem that always enters its reader slow path
  2019-07-18 21:09 This percpu_rwsem that always enters its reader slow path Joel Fernandes
@ 2019-07-18 21:10 ` Joel Fernandes
  2019-07-19  7:50 ` Byungchul Park
  1 sibling, 0 replies; 4+ messages in thread
From: Joel Fernandes @ 2019-07-18 21:10 UTC (permalink / raw)
  To: rcu; +Cc: Byungchul Park

On Thu, Jul 18, 2019 at 5:09 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hello friends,
>
> Just providing an update on my debugging of percpu_rwsem (related to
> rcu-sync) for the day! which I pinged Byungchul about. Please ignore
> this email if you are busy :) I am just archiving it in here..
>
> As you may know, percpu_rwsem uses rcu-sync framework to reduce cost
> of read-side by making it free of any serializing/atomic instructions
> at all. However, there was one sempahore which broke the rules!
>
> I spent a couple hours trying to figure out why
> cgroup_threadgroup_rwsem always entered the reader-slow path on my
> system (RCU-sync turns out to be non-idle for this rwsem). I really
> thought it was a bug, because I felt what's the pointed of rcu-sync if
> it never goes idle..
>
> Then I landed on the commit below, and turns it was done for Android
> and reported by John :) And the patch author was a certain guy named
> Peter :)
>
> commit 3942a9bd7b5842a924e99ee6ec1350b8006c94ec
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Thu Aug 11 18:54:13 2016 +0200
>
>     locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
> -----------
>
> Basically, this commit makes the read-side cost percpu_rwsem slightly
> more expensive (one smp_load_acquire of readers_block, at the cost of
> making write-side a bit more expensive...)

I meant here, at the benefit of making write-side cheap and avoiding
synchronize_rcu...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: This percpu_rwsem that always enters its reader slow path
  2019-07-18 21:09 This percpu_rwsem that always enters its reader slow path Joel Fernandes
  2019-07-18 21:10 ` Joel Fernandes
@ 2019-07-19  7:50 ` Byungchul Park
  2019-07-21 13:28   ` Joel Fernandes
  1 sibling, 1 reply; 4+ messages in thread
From: Byungchul Park @ 2019-07-19  7:50 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu, kernel-team

On Thu, Jul 18, 2019 at 05:09:45PM -0400, Joel Fernandes wrote:
> Hello friends,
> 
> Just providing an update on my debugging of percpu_rwsem (related to
> rcu-sync) for the day! which I pinged Byungchul about. Please ignore
> this email if you are busy :) I am just archiving it in here..
> 
> As you may know, percpu_rwsem uses rcu-sync framework to reduce cost
> of read-side by making it free of any serializing/atomic instructions
> at all. However, there was one sempahore which broke the rules!
> 
> I spent a couple hours trying to figure out why
> cgroup_threadgroup_rwsem always entered the reader-slow path on my
> system (RCU-sync turns out to be non-idle for this rwsem). I really
> thought it was a bug, because I felt what's the pointed of rcu-sync if
> it never goes idle..

Yes, with the following patch, the cgroup rwsem cannot make use of
rcu_sync any more, but it still gets benefit from percpu structure
as you told me like avoiding cache bouncing and contention on a shared
area even though every read lock keeps firing smp full barrier.

What matters is which one is more expensive between (1) firing smp_mb
and (2) accessing a shared data, sem->count, and acquiring/releasing
sem->wait_lock. I think using percpu-rwsem involving the smp barrier is
much better even with rcu_sync disabled.

Or am I missing the point? Please let me know if so.

Thanks,
Byungchul

> Then I landed on the commit below, and turns it was done for Android
> and reported by John :) And the patch author was a certain guy named
> Peter :)
> 
> commit 3942a9bd7b5842a924e99ee6ec1350b8006c94ec
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Thu Aug 11 18:54:13 2016 +0200
> 
>     locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
> -----------
> 
> Basically, this commit makes the read-side cost percpu_rwsem slightly
> more expensive (one smp_load_acquire of readers_block, at the cost of
> making write-side a bit more expensive...)
> 
> So turns out it is weird, but it is certainly not a bug.
> 
> Learned something new but wasted my time a bit :)
> 
> Cheers, and see you later,
> - Joel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: This percpu_rwsem that always enters its reader slow path
  2019-07-19  7:50 ` Byungchul Park
@ 2019-07-21 13:28   ` Joel Fernandes
  0 siblings, 0 replies; 4+ messages in thread
From: Joel Fernandes @ 2019-07-21 13:28 UTC (permalink / raw)
  To: Byungchul Park; +Cc: rcu, kernel-team

On Fri, Jul 19, 2019 at 04:50:11PM +0900, Byungchul Park wrote:
> On Thu, Jul 18, 2019 at 05:09:45PM -0400, Joel Fernandes wrote:
> > Hello friends,
> > 
> > Just providing an update on my debugging of percpu_rwsem (related to
> > rcu-sync) for the day! which I pinged Byungchul about. Please ignore
> > this email if you are busy :) I am just archiving it in here..
> > 
> > As you may know, percpu_rwsem uses rcu-sync framework to reduce cost
> > of read-side by making it free of any serializing/atomic instructions
> > at all. However, there was one sempahore which broke the rules!
> > 
> > I spent a couple hours trying to figure out why
> > cgroup_threadgroup_rwsem always entered the reader-slow path on my
> > system (RCU-sync turns out to be non-idle for this rwsem). I really
> > thought it was a bug, because I felt what's the pointed of rcu-sync if
> > it never goes idle..
> 
> Yes, with the following patch, the cgroup rwsem cannot make use of
> rcu_sync any more, but it still gets benefit from percpu structure
> as you told me like avoiding cache bouncing and contention on a shared
> area even though every read lock keeps firing smp full barrier.

Yes. So it seems to me main benefit of RCU in percpu_rw_sempahore is to
completely avoid memory barriers in the read path, while also benefiting from
the percpu nature of the lock.

> What matters is which one is more expensive between (1) firing smp_mb
> and (2) accessing a shared data, sem->count, and acquiring/releasing
> sem->wait_lock. I think using percpu-rwsem involving the smp barrier is
> much better even with rcu_sync disabled.

Right. Fully agreed.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-21 13:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18 21:09 This percpu_rwsem that always enters its reader slow path Joel Fernandes
2019-07-18 21:10 ` Joel Fernandes
2019-07-19  7:50 ` Byungchul Park
2019-07-21 13:28   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).