linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning
@ 2022-04-27 15:52 Davidlohr Bueso
  2022-04-27 21:26 ` Namhyung Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Davidlohr Bueso @ 2022-04-27 15:52 UTC (permalink / raw)
  To: peterz; +Cc: namhyung, longman, will, boqun.feng, mingo, dave, linux-kernel

Similar to the mutex counterpart, we can further distinguish the
types of contention. Similarly this patch also introduces potentially
various _begin() tracepoints with a single respective _end().

- The original _begin() for down_write() is moved further up,
right after we know optimistic spinning is bust.

- For the scenario that spins after setting the hand-off bit and
failing to grab the lock the code is change to duplicate the
rwsem_try_write_lock() upon a OWNER_NULL, which minimizes the
amounts of _begin() in the wait-loop - but also gets rid of a
goto label and the logic is pretty encapsulated in the branch.
In such cases the common case will be to acquire the lock,
but if it is stolen in that window, this path won't see the
signal_pending() now in the iteration and block. This should
be benign as the check will come in after waking if the lock
can still not be taken.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
rtmutexes' top waiter will also do optimitic spinning, but
I don't think it is worth adding tracepoints here as it
is quite minimal, unlike the osq stuff.

 include/trace/events/lock.h |  4 +++-
 kernel/locking/rwsem.c      | 29 +++++++++++++++++++++--------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
index 9ebd081e057e..0f68a3e69a9f 100644
--- a/include/trace/events/lock.h
+++ b/include/trace/events/lock.h
@@ -15,6 +15,7 @@
 #define LCB_F_RT	(1U << 3)
 #define LCB_F_PERCPU	(1U << 4)
 #define LCB_F_MUTEX	(1U << 5)
+#define LCB_F_RWSEM	(1U << 6)
 
 
 #ifdef CONFIG_LOCKDEP
@@ -115,7 +116,8 @@ TRACE_EVENT(contention_begin,
 				{ LCB_F_WRITE,		"WRITE" },
 				{ LCB_F_RT,		"RT" },
 				{ LCB_F_PERCPU,		"PERCPU" },
-				{ LCB_F_MUTEX,		"MUTEX" }
+				{ LCB_F_MUTEX,		"MUTEX" },
+				{ LCB_F_RWSEM,		"RWSEM" }
 			  ))
 );
 
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 9d1db4a54d34..c38f990cacea 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1057,7 +1057,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	if (!wake_q_empty(&wake_q))
 		wake_up_q(&wake_q);
 
-	trace_contention_begin(sem, LCB_F_READ);
+	trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_READ);
 
 	/* wait to be given the lock */
 	for (;;) {
@@ -1101,9 +1101,14 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_can_spin_on_owner(sem) && rwsem_optimistic_spin(sem)) {
-		/* rwsem_optimistic_spin() implies ACQUIRE on success */
-		return sem;
+	if (rwsem_can_spin_on_owner(sem)) {
+		trace_contention_begin(sem,
+				       LCB_F_RWSEM | LCB_F_WRITE | LCB_F_SPIN);
+		if (rwsem_optimistic_spin(sem)) {
+			/* rwsem_optimistic_spin() implies ACQUIRE on success */
+			trace_contention_end(sem, 0);
+			return sem;
+		}
 	}
 
 	/*
@@ -1115,6 +1120,8 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 	waiter.handoff_set = false;
 
+	trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);
+
 	raw_spin_lock_irq(&sem->wait_lock);
 	rwsem_add_waiter(sem, &waiter);
 
@@ -1137,7 +1144,6 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
-	trace_contention_begin(sem, LCB_F_WRITE);
 
 	for (;;) {
 		if (rwsem_try_write_lock(sem, &waiter)) {
@@ -1161,18 +1167,25 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		if (waiter.handoff_set) {
 			enum owner_state owner_state;
 
+			trace_contention_begin(sem, LCB_F_RWSEM |
+					       LCB_F_WRITE | LCB_F_SPIN);
 			preempt_disable();
 			owner_state = rwsem_spin_on_owner(sem);
 			preempt_enable();
 
-			if (owner_state == OWNER_NULL)
-				goto trylock_again;
+			if (owner_state == OWNER_NULL) {
+				raw_spin_lock_irq(&sem->wait_lock);
+				if (rwsem_try_write_lock(sem, &waiter))
+					break;
+				raw_spin_unlock_irq(&sem->wait_lock);
+			}
+
+			trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);
 		}
 
 		schedule();
 		lockevent_inc(rwsem_sleep_writer);
 		set_current_state(state);
-trylock_again:
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning
  2022-04-27 15:52 [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning Davidlohr Bueso
@ 2022-04-27 21:26 ` Namhyung Kim
  2022-04-29 18:45   ` Davidlohr Bueso
  0 siblings, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2022-04-27 21:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Will Deacon, Boqun Feng,
	Ingo Molnar, linux-kernel

Hi Davidlohr,

On Wed, Apr 27, 2022 at 9:04 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> Similar to the mutex counterpart, we can further distinguish the
> types of contention. Similarly this patch also introduces potentially
> various _begin() tracepoints with a single respective _end().

Thanks for doing this.  I was thinking about it as well.

>
> - The original _begin() for down_write() is moved further up,
> right after we know optimistic spinning is bust.
>
> - For the scenario that spins after setting the hand-off bit and
> failing to grab the lock the code is change to duplicate the
> rwsem_try_write_lock() upon a OWNER_NULL, which minimizes the
> amounts of _begin() in the wait-loop - but also gets rid of a
> goto label and the logic is pretty encapsulated in the branch.
> In such cases the common case will be to acquire the lock,
> but if it is stolen in that window, this path won't see the
> signal_pending() now in the iteration and block. This should
> be benign as the check will come in after waking if the lock
> can still not be taken.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> rtmutexes' top waiter will also do optimitic spinning, but
> I don't think it is worth adding tracepoints here as it
> is quite minimal, unlike the osq stuff.
>
>  include/trace/events/lock.h |  4 +++-
>  kernel/locking/rwsem.c      | 29 +++++++++++++++++++++--------
>  2 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
> index 9ebd081e057e..0f68a3e69a9f 100644
> --- a/include/trace/events/lock.h
> +++ b/include/trace/events/lock.h
> @@ -15,6 +15,7 @@
>  #define LCB_F_RT       (1U << 3)
>  #define LCB_F_PERCPU   (1U << 4)
>  #define LCB_F_MUTEX    (1U << 5)
> +#define LCB_F_RWSEM    (1U << 6)
>
>
>  #ifdef CONFIG_LOCKDEP
> @@ -115,7 +116,8 @@ TRACE_EVENT(contention_begin,
>                                 { LCB_F_WRITE,          "WRITE" },
>                                 { LCB_F_RT,             "RT" },
>                                 { LCB_F_PERCPU,         "PERCPU" },
> -                               { LCB_F_MUTEX,          "MUTEX" }
> +                               { LCB_F_MUTEX,          "MUTEX" },
> +                               { LCB_F_RWSEM,          "RWSEM" }
>                           ))
>  );

Well I'm ok with this but it'd be better if we can do this
without adding a new flag.  Originally a mutex can be
identified with 0, and a rwsem with either of READ or WRITE.

The MUTEX flag was added to note optimistic spins
on mutex and now we need something similar for
rwsem.  Then can we change the MUTEX to OPTIMISTIC
if it's not too late?

>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 9d1db4a54d34..c38f990cacea 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -1057,7 +1057,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
>         if (!wake_q_empty(&wake_q))
>                 wake_up_q(&wake_q);
>
> -       trace_contention_begin(sem, LCB_F_READ);
> +       trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_READ);
>
>         /* wait to be given the lock */
>         for (;;) {
> @@ -1101,9 +1101,14 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>         DEFINE_WAKE_Q(wake_q);
>
>         /* do optimistic spinning and steal lock if possible */
> -       if (rwsem_can_spin_on_owner(sem) && rwsem_optimistic_spin(sem)) {
> -               /* rwsem_optimistic_spin() implies ACQUIRE on success */
> -               return sem;
> +       if (rwsem_can_spin_on_owner(sem)) {
> +               trace_contention_begin(sem,
> +                                      LCB_F_RWSEM | LCB_F_WRITE | LCB_F_SPIN);
> +               if (rwsem_optimistic_spin(sem)) {
> +                       /* rwsem_optimistic_spin() implies ACQUIRE on success */
> +                       trace_contention_end(sem, 0);
> +                       return sem;
> +               }
>         }
>
>         /*
> @@ -1115,6 +1120,8 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>         waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
>         waiter.handoff_set = false;
>
> +       trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);
> +
>         raw_spin_lock_irq(&sem->wait_lock);
>         rwsem_add_waiter(sem, &waiter);
>
> @@ -1137,7 +1144,6 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>
>         /* wait until we successfully acquire the lock */
>         set_current_state(state);
> -       trace_contention_begin(sem, LCB_F_WRITE);
>
>         for (;;) {
>                 if (rwsem_try_write_lock(sem, &waiter)) {
> @@ -1161,18 +1167,25 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>                 if (waiter.handoff_set) {
>                         enum owner_state owner_state;
>
> +                       trace_contention_begin(sem, LCB_F_RWSEM |
> +                                              LCB_F_WRITE | LCB_F_SPIN);
>                         preempt_disable();
>                         owner_state = rwsem_spin_on_owner(sem);
>                         preempt_enable();
>
> -                       if (owner_state == OWNER_NULL)
> -                               goto trylock_again;
> +                       if (owner_state == OWNER_NULL) {
> +                               raw_spin_lock_irq(&sem->wait_lock);
> +                               if (rwsem_try_write_lock(sem, &waiter))
> +                                       break;
> +                               raw_spin_unlock_irq(&sem->wait_lock);
> +                       }
> +
> +                       trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);

I'm afraid that it'd generate many contention_begin
trace events for a single lock acquisition.

Thanks,
Namhyung

>                 }
>
>                 schedule();
>                 lockevent_inc(rwsem_sleep_writer);
>                 set_current_state(state);
> -trylock_again:
>                 raw_spin_lock_irq(&sem->wait_lock);
>         }
>         __set_current_state(TASK_RUNNING);
> --
> 2.36.0
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning
  2022-04-27 21:26 ` Namhyung Kim
@ 2022-04-29 18:45   ` Davidlohr Bueso
  2022-05-31 21:08     ` Namhyung Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Davidlohr Bueso @ 2022-04-29 18:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Waiman Long, Will Deacon, Boqun Feng,
	Ingo Molnar, linux-kernel

Sorry for the late reply.

On Wed, 27 Apr 2022, Namhyung Kim wrote:

>Hi Davidlohr,
>
>On Wed, Apr 27, 2022 at 9:04 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>>
>> Similar to the mutex counterpart, we can further distinguish the
>> types of contention. Similarly this patch also introduces potentially
>> various _begin() tracepoints with a single respective _end().
>
>Thanks for doing this.  I was thinking about it as well.

I really like your work on this. I've always wanted a low overhead
equivalent-ish of lock_stat, which could be used in production and
look forward to see these tracepoints put to good use.

>> @@ -115,7 +116,8 @@ TRACE_EVENT(contention_begin,
>>                                 { LCB_F_WRITE,          "WRITE" },
>>                                 { LCB_F_RT,             "RT" },
>>                                 { LCB_F_PERCPU,         "PERCPU" },
>> -                               { LCB_F_MUTEX,          "MUTEX" }
>> +                               { LCB_F_MUTEX,          "MUTEX" },
>> +                               { LCB_F_RWSEM,          "RWSEM" }
>>                           ))
>>  );
>
>Well I'm ok with this but it'd be better if we can do this
>without adding a new flag.  Originally a mutex can be
>identified with 0, and a rwsem with either of READ or WRITE.
>
>The MUTEX flag was added to note optimistic spins
>on mutex and now we need something similar for
>rwsem.  Then can we change the MUTEX to OPTIMISTIC
>if it's not too late?

Ok. Perhaps name it OSQ? I had thought of that but at the
time was also thinking about potentially the rtmutex and
rt spinlock spinning too - which don't use osq so the name
would fit in that sense.

While not in Linus' tree, I wouldn't think it's too late.

>>         for (;;) {
>>                 if (rwsem_try_write_lock(sem, &waiter)) {
>> @@ -1161,18 +1167,25 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
>>                 if (waiter.handoff_set) {
>>                         enum owner_state owner_state;
>>
>> +                       trace_contention_begin(sem, LCB_F_RWSEM |
>> +                                              LCB_F_WRITE | LCB_F_SPIN);
>>                         preempt_disable();
>>                         owner_state = rwsem_spin_on_owner(sem);
>>                         preempt_enable();
>>
>> -                       if (owner_state == OWNER_NULL)
>> -                               goto trylock_again;
>> +                       if (owner_state == OWNER_NULL) {
>> +                               raw_spin_lock_irq(&sem->wait_lock);
>> +                               if (rwsem_try_write_lock(sem, &waiter))
>> +                                       break;
>> +                               raw_spin_unlock_irq(&sem->wait_lock);
>> +                       }
>> +
>> +                       trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);
>
>I'm afraid that it'd generate many contention_begin
>trace events for a single lock acquisition.

You are right, lets just trace the "normal" optimistic spinning
at the start of the write slowpath.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning
  2022-04-29 18:45   ` Davidlohr Bueso
@ 2022-05-31 21:08     ` Namhyung Kim
  0 siblings, 0 replies; 4+ messages in thread
From: Namhyung Kim @ 2022-05-31 21:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Waiman Long, Will Deacon, Boqun Feng,
	Ingo Molnar, linux-kernel

Hi Davidlohr,

On Fri, Apr 29, 2022 at 11:56 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
>
> Sorry for the late reply.
>
> On Wed, 27 Apr 2022, Namhyung Kim wrote:
>
> >Hi Davidlohr,
> >
> >On Wed, Apr 27, 2022 at 9:04 AM Davidlohr Bueso <dave@stgolabs.net> wrote:
> >>
> >> Similar to the mutex counterpart, we can further distinguish the
> >> types of contention. Similarly this patch also introduces potentially
> >> various _begin() tracepoints with a single respective _end().
> >
> >Thanks for doing this.  I was thinking about it as well.
>
> I really like your work on this. I've always wanted a low overhead
> equivalent-ish of lock_stat, which could be used in production and
> look forward to see these tracepoints put to good use.
>
> >> @@ -115,7 +116,8 @@ TRACE_EVENT(contention_begin,
> >>                                 { LCB_F_WRITE,          "WRITE" },
> >>                                 { LCB_F_RT,             "RT" },
> >>                                 { LCB_F_PERCPU,         "PERCPU" },
> >> -                               { LCB_F_MUTEX,          "MUTEX" }
> >> +                               { LCB_F_MUTEX,          "MUTEX" },
> >> +                               { LCB_F_RWSEM,          "RWSEM" }
> >>                           ))
> >>  );
> >
> >Well I'm ok with this but it'd be better if we can do this
> >without adding a new flag.  Originally a mutex can be
> >identified with 0, and a rwsem with either of READ or WRITE.
> >
> >The MUTEX flag was added to note optimistic spins
> >on mutex and now we need something similar for
> >rwsem.  Then can we change the MUTEX to OPTIMISTIC
> >if it's not too late?
>
> Ok. Perhaps name it OSQ? I had thought of that but at the
> time was also thinking about potentially the rtmutex and
> rt spinlock spinning too - which don't use osq so the name
> would fit in that sense.
>
> While not in Linus' tree, I wouldn't think it's too late.

Any updates?  It's now in Linus' tree so we should change
this before the official release.  Or we can keep the current
flags and then add one like in your original code.


>
> >>         for (;;) {
> >>                 if (rwsem_try_write_lock(sem, &waiter)) {
> >> @@ -1161,18 +1167,25 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
> >>                 if (waiter.handoff_set) {
> >>                         enum owner_state owner_state;
> >>
> >> +                       trace_contention_begin(sem, LCB_F_RWSEM |
> >> +                                              LCB_F_WRITE | LCB_F_SPIN);
> >>                         preempt_disable();
> >>                         owner_state = rwsem_spin_on_owner(sem);
> >>                         preempt_enable();
> >>
> >> -                       if (owner_state == OWNER_NULL)
> >> -                               goto trylock_again;
> >> +                       if (owner_state == OWNER_NULL) {
> >> +                               raw_spin_lock_irq(&sem->wait_lock);
> >> +                               if (rwsem_try_write_lock(sem, &waiter))
> >> +                                       break;
> >> +                               raw_spin_unlock_irq(&sem->wait_lock);
> >> +                       }
> >> +
> >> +                       trace_contention_begin(sem, LCB_F_RWSEM | LCB_F_WRITE);
> >
> >I'm afraid that it'd generate many contention_begin
> >trace events for a single lock acquisition.
>
> You are right, lets just trace the "normal" optimistic spinning
> at the start of the write slowpath.

I have to admit that I overlooked the mutex code already
has the same logic.  I still prefer having less number of
events but you might want to have the same with the
mutex for the precise tracking of the spins.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-31 21:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 15:52 [PATCH] locking/rwsem: Teach contention tracing about optimistic spinning Davidlohr Bueso
2022-04-27 21:26 ` Namhyung Kim
2022-04-29 18:45   ` Davidlohr Bueso
2022-05-31 21:08     ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).