[PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
@ 2022-07-06 13:59 Waiman Long
  2022-07-06 14:03 ` Waiman Long
  2022-07-11  9:34 ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Waiman Long @ 2022-07-06 13:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Mike Stowell, Waiman Long

Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
stealing") allows unlimited number of lock stealing's for non-RT
tasks. That can lead to lock starvation of non-RT top waiter tasks if
there is a constant incoming stream of non-RT lockers. This can cause
rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
For example,

[77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.

Avoiding this problem and ensuring forward progress by limiting the
number of times that a lock can be stolen from each waiter. This patch
sets a threshold of 32. That number is arbitrary and can be changed
if needed.

Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rtmutex.c        | 9 ++++++---
 kernel/locking/rtmutex_common.h | 8 ++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

 [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8abc2a..bdddb3dc36c2 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -359,10 +359,13 @@ static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,
 	if (rt_prio(waiter->prio) || dl_prio(waiter->prio))
 		return false;
 
-	return rt_mutex_waiter_equal(waiter, top_waiter);
-#else
-	return false;
+	if (rt_mutex_waiter_equal(waiter, top_waiter) &&
+	   (top_waiter->nr_steals < RT_MUTEX_LOCK_STEAL_MAX)) {
+		top_waiter->nr_steals++;
+		return true;
+	}
 #endif
+	return false;
 }
 
 #define __node_2_waiter(node) \
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index c47e8361bfb5..d5eeba955f07 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -26,6 +26,7 @@
  * @task:		task reference to the blocked task
  * @lock:		Pointer to the rt_mutex on which the waiter blocks
  * @wake_state:		Wakeup state to use (TASK_NORMAL or TASK_RTLOCK_WAIT)
+ * @nr_steals:		Number of times the lock is stolen
  * @prio:		Priority of the waiter
  * @deadline:		Deadline of the waiter if applicable
  * @ww_ctx:		WW context pointer
@@ -36,11 +37,17 @@ struct rt_mutex_waiter {
 	struct task_struct	*task;
 	struct rt_mutex_base	*lock;
 	unsigned int		wake_state;
+	unsigned int		nr_steals;
 	int			prio;
 	u64			deadline;
 	struct ww_acquire_ctx	*ww_ctx;
 };
 
+/*
+ * The maximum number of times where lock can be stolen per waiter.
+ */
+#define	RT_MUTEX_LOCK_STEAL_MAX	(1U << 5)
+
 /**
  * rt_wake_q_head - Wrapper around regular wake_q_head to support
  *		    "sleeping" spinlocks on RT
@@ -194,6 +201,7 @@ static inline void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
 	RB_CLEAR_NODE(&waiter->tree_entry);
 	waiter->wake_state = TASK_NORMAL;
 	waiter->task = NULL;
+	waiter->nr_steals = 0;
 }
 
 static inline void rt_mutex_init_rtlock_waiter(struct rt_mutex_waiter *waiter)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-06 13:59 [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters Waiman Long
@ 2022-07-06 14:03 ` Waiman Long
  2022-07-07 18:22   ` Boqun Feng
  2022-07-11  9:34 ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Waiman Long @ 2022-07-06 14:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Thomas Gleixner, Sebastian Andrzej Siewior,
	Juri Lelli, Mike Stowell

On 7/6/22 09:59, Waiman Long wrote:
> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> stealing") allows unlimited number of lock stealing's for non-RT
> tasks. That can lead to lock starvation of non-RT top waiter tasks if
> there is a constant incoming stream of non-RT lockers. This can cause
> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> For example,
>
> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>
> Avoiding this problem and ensuring forward progress by limiting the
> number of times that a lock can be stolen from each waiter. This patch
> sets a threshold of 32. That number is arbitrary and can be changed
> if needed.
>
> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/locking/rtmutex.c        | 9 ++++++---
>   kernel/locking/rtmutex_common.h | 8 ++++++++
>   2 files changed, 14 insertions(+), 3 deletions(-)
>
>   [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]

Note that I decided to increase the threshold to 32 from 10 to reduce 
the potential performance impact of this change, if any. We also found 
out that this patch can fix some of the rcu_preempt self-detected stall 
problems that we saw with the PREEMPT_RT kernel. So I added that 
information in the patch description.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-06 14:03 ` Waiman Long
@ 2022-07-07 18:22   ` Boqun Feng
  2022-07-07 18:45     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Boqun Feng @ 2022-07-07 18:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
> On 7/6/22 09:59, Waiman Long wrote:
> > Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> > stealing") allows unlimited number of lock stealing's for non-RT
> > tasks. That can lead to lock starvation of non-RT top waiter tasks if
> > there is a constant incoming stream of non-RT lockers. This can cause
> > rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> > For example,
> > 
> > [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> > [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> > 
> > Avoiding this problem and ensuring forward progress by limiting the
> > number of times that a lock can be stolen from each waiter. This patch
> > sets a threshold of 32. That number is arbitrary and can be changed
> > if needed.
> > 
> > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > Signed-off-by: Waiman Long <longman@redhat.com>
> > ---
> >   kernel/locking/rtmutex.c        | 9 ++++++---
> >   kernel/locking/rtmutex_common.h | 8 ++++++++
> >   2 files changed, 14 insertions(+), 3 deletions(-)
> > 
> >   [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
> 
> Note that I decided to increase the threshold to 32 from 10 to reduce the
> potential performance impact of this change, if any. We also found out that
> this patch can fix some of the rcu_preempt self-detected stall problems that
> we saw with the PREEMPT_RT kernel. So I added that information in the patch
> description.
> 

Have you considered (and tested) whether we can set the threshold
directly proportional to nr_cpu_ids? Because IIUC, the favorable case
for lock stealing is that every CPU gets a chance to steal once. If one
CPU can steal twice, 1) either there is a context switch between two
tasks, which costs similarly as waking up the waiter, or 2) a task drops
and re-graps a lock, which means the task wants to yield to other
waiters of the lock.

Regards,
Boqun

> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-07 18:22   ` Boqun Feng
@ 2022-07-07 18:45     ` Waiman Long
  2022-07-07 19:04       ` Boqun Feng
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2022-07-07 18:45 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On 7/7/22 14:22, Boqun Feng wrote:
> On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
>> On 7/6/22 09:59, Waiman Long wrote:
>>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>>> stealing") allows unlimited number of lock stealing's for non-RT
>>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>>> there is a constant incoming stream of non-RT lockers. This can cause
>>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>>> For example,
>>>
>>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>>
>>> Avoiding this problem and ensuring forward progress by limiting the
>>> number of times that a lock can be stolen from each waiter. This patch
>>> sets a threshold of 32. That number is arbitrary and can be changed
>>> if needed.
>>>
>>> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>    kernel/locking/rtmutex.c        | 9 ++++++---
>>>    kernel/locking/rtmutex_common.h | 8 ++++++++
>>>    2 files changed, 14 insertions(+), 3 deletions(-)
>>>
>>>    [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
>> Note that I decided to increase the threshold to 32 from 10 to reduce the
>> potential performance impact of this change, if any. We also found out that
>> this patch can fix some of the rcu_preempt self-detected stall problems that
>> we saw with the PREEMPT_RT kernel. So I added that information in the patch
>> description.
>>
> Have you considered (and tested) whether we can set the threshold
> directly proportional to nr_cpu_ids? Because IIUC, the favorable case
> for lock stealing is that every CPU gets a chance to steal once. If one
> CPU can steal twice, 1) either there is a context switch between two
> tasks, which costs similarly as waking up the waiter, or 2) a task drops
> and re-graps a lock, which means the task wants to yield to other
> waiters of the lock.

There is no inherent restriction on not allowing the same cpu stealing 
the lock twice or more. With rtmutex, the top waiter may be sleeping and 
the wakeup latency can be considerable. By allowing another ready lock 
waiter to steal the lock for productive use, it can improve system 
throughput. There is no fairness in lock stealing and I don't believe it 
is a worthwhile goal to allow each cpu to steal the lock once. It will 
just complicate the code.

On the other hand, unlimited lock stealing is bad and we have a put a 
limit somehow to ensure forward progress.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-07 18:45     ` Waiman Long
@ 2022-07-07 19:04       ` Boqun Feng
  2022-07-07 19:25         ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Boqun Feng @ 2022-07-07 19:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On Thu, Jul 07, 2022 at 02:45:10PM -0400, Waiman Long wrote:
> On 7/7/22 14:22, Boqun Feng wrote:
> > On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
> > > On 7/6/22 09:59, Waiman Long wrote:
> > > > Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> > > > stealing") allows unlimited number of lock stealing's for non-RT
> > > > tasks. That can lead to lock starvation of non-RT top waiter tasks if
> > > > there is a constant incoming stream of non-RT lockers. This can cause
> > > > rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> > > > For example,
> > > > 
> > > > [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> > > > [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> > > > 
> > > > Avoiding this problem and ensuring forward progress by limiting the
> > > > number of times that a lock can be stolen from each waiter. This patch
> > > > sets a threshold of 32. That number is arbitrary and can be changed
> > > > if needed.
> > > > 
> > > > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > > ---
> > > >    kernel/locking/rtmutex.c        | 9 ++++++---
> > > >    kernel/locking/rtmutex_common.h | 8 ++++++++
> > > >    2 files changed, 14 insertions(+), 3 deletions(-)
> > > > 
> > > >    [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
> > > Note that I decided to increase the threshold to 32 from 10 to reduce the
> > > potential performance impact of this change, if any. We also found out that
> > > this patch can fix some of the rcu_preempt self-detected stall problems that
> > > we saw with the PREEMPT_RT kernel. So I added that information in the patch
> > > description.
> > > 
> > Have you considered (and tested) whether we can set the threshold
> > directly proportional to nr_cpu_ids? Because IIUC, the favorable case
> > for lock stealing is that every CPU gets a chance to steal once. If one
> > CPU can steal twice, 1) either there is a context switch between two
> > tasks, which costs similarly as waking up the waiter, or 2) a task drops
> > and re-graps a lock, which means the task wants to yield to other
> > waiters of the lock.
> 
> There is no inherent restriction on not allowing the same cpu stealing the
> lock twice or more. With rtmutex, the top waiter may be sleeping and the

Well, I'm not saying we need to restrict the same cpu to steal a lock
twice or more. Think about this, when there is a task running on CPU 1
already steals a lock once, for example:

	<lock release>
	{task C is the top waiter}

	CPU 1
	=====
	<now task A running>
	lock(); // steal the lock
	...
	unlock():
	  // set owner to NULL
	  <switch task B> // similar cost to wake up A
	  lock(); // steal the lock

, which means if a CPU steals a lock twice or more, it's almost certain
that a context happened between two steals ("almost" because there could
be a case where task A lock()+unlock() twice, but as I said, it
means that task A is willing to yield.).

Therefore if there are @nr_cpu_ids lock steals, it means either there is
a context switch somewhere or a task has been willing to yield. And I
think it's a reasonable signal to stop lock stealing.

Thoughts?

Regards,
Boqun

> wakeup latency can be considerable. By allowing another ready lock waiter to
> steal the lock for productive use, it can improve system throughput. There
> is no fairness in lock stealing and I don't believe it is a worthwhile goal
> to allow each cpu to steal the lock once. It will just complicate the code.
> 
> On the other hand, unlimited lock stealing is bad and we have a put a limit
> somehow to ensure forward progress.
> 
> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-07 19:04       ` Boqun Feng
@ 2022-07-07 19:25         ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2022-07-07 19:25 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On 7/7/22 15:04, Boqun Feng wrote:
> On Thu, Jul 07, 2022 at 02:45:10PM -0400, Waiman Long wrote:
>> On 7/7/22 14:22, Boqun Feng wrote:
>>> On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
>>>> On 7/6/22 09:59, Waiman Long wrote:
>>>>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>>>>> stealing") allows unlimited number of lock stealing's for non-RT
>>>>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>>>>> there is a constant incoming stream of non-RT lockers. This can cause
>>>>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>>>>> For example,
>>>>>
>>>>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>>>>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>>>>
>>>>> Avoiding this problem and ensuring forward progress by limiting the
>>>>> number of times that a lock can be stolen from each waiter. This patch
>>>>> sets a threshold of 32. That number is arbitrary and can be changed
>>>>> if needed.
>>>>>
>>>>> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>>     kernel/locking/rtmutex.c        | 9 ++++++---
>>>>>     kernel/locking/rtmutex_common.h | 8 ++++++++
>>>>>     2 files changed, 14 insertions(+), 3 deletions(-)
>>>>>
>>>>>     [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
>>>> Note that I decided to increase the threshold to 32 from 10 to reduce the
>>>> potential performance impact of this change, if any. We also found out that
>>>> this patch can fix some of the rcu_preempt self-detected stall problems that
>>>> we saw with the PREEMPT_RT kernel. So I added that information in the patch
>>>> description.
>>>>
>>> Have you considered (and tested) whether we can set the threshold
>>> directly proportional to nr_cpu_ids? Because IIUC, the favorable case
>>> for lock stealing is that every CPU gets a chance to steal once. If one
>>> CPU can steal twice, 1) either there is a context switch between two
>>> tasks, which costs similarly as waking up the waiter, or 2) a task drops
>>> and re-graps a lock, which means the task wants to yield to other
>>> waiters of the lock.
>> There is no inherent restriction on not allowing the same cpu stealing the
>> lock twice or more. With rtmutex, the top waiter may be sleeping and the
> Well, I'm not saying we need to restrict the same cpu to steal a lock
> twice or more. Think about this, when there is a task running on CPU 1
> already steals a lock once, for example:
>
> 	<lock release>
> 	{task C is the top waiter}
>
> 	CPU 1
> 	=====
> 	<now task A running>
> 	lock(); // steal the lock
> 	...
> 	unlock():
> 	  // set owner to NULL
> 	  <switch task B> // similar cost to wake up A
> 	  lock(); // steal the lock
>
> , which means if a CPU steals a lock twice or more, it's almost certain
> that a context happened between two steals ("almost" because there could
> be a case where task A lock()+unlock() twice, but as I said, it
> means that task A is willing to yield.).
>
> Therefore if there are @nr_cpu_ids lock steals, it means either there is
> a context switch somewhere or a task has been willing to yield. And I
> think it's a reasonable signal to stop lock stealing.
>
> Thoughts?

The reality is that a task can acquire the same lock multiple times 
before a context switch. So I believe stealing a lock from the same 
sleeping top waiter multiple times can certainly happen. For a large SMP 
systems with hundred or even thousands of cpus, allowing that many lock 
stealing may significantly increase the lock acquisition latency for the 
unfortunate tasks.

Another alternative that I have done in the past is to put in a time 
stamp where a task become the top waiter and refrained from stealing the 
lock when the elapsed time from the time stamp exceeds a certain limit. 
That will limit the max lock acquisition latency of a non-RT task as 
long as there are no other RT tasks competing with it for the lock.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-06 13:59 [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters Waiman Long
  2022-07-06 14:03 ` Waiman Long
@ 2022-07-11  9:34 ` Peter Zijlstra
  2022-07-11 20:02   ` Waiman Long
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2022-07-11  9:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On Wed, Jul 06, 2022 at 09:59:16AM -0400, Waiman Long wrote:
> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> stealing") allows unlimited number of lock stealing's for non-RT
> tasks. That can lead to lock starvation of non-RT top waiter tasks if
> there is a constant incoming stream of non-RT lockers. This can cause
> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> For example,
> 
> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> 
> Avoiding this problem and ensuring forward progress by limiting the
> number of times that a lock can be stolen from each waiter. This patch
> sets a threshold of 32. That number is arbitrary and can be changed
> if needed.
> 

Why not do the same thing we do for regular mutexes?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters
  2022-07-11  9:34 ` Peter Zijlstra
@ 2022-07-11 20:02   ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2022-07-11 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, linux-kernel,
	Thomas Gleixner, Sebastian Andrzej Siewior, Juri Lelli,
	Mike Stowell

On 7/11/22 05:34, Peter Zijlstra wrote:
> On Wed, Jul 06, 2022 at 09:59:16AM -0400, Waiman Long wrote:
>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>> stealing") allows unlimited number of lock stealing's for non-RT
>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>> there is a constant incoming stream of non-RT lockers. This can cause
>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>> For example,
>>
>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>
>> Avoiding this problem and ensuring forward progress by limiting the
>> number of times that a lock can be stolen from each waiter. This patch
>> sets a threshold of 32. That number is arbitrary and can be changed
>> if needed.
>>
> Why not do the same thing we do for regular mutexes?
>
The mutex way is another possible alternative. So we can set a flag to 
disable lock stealing if the current top waiter wake up and the rtmutex 
has been stolen. I will need to run some tests to find out how many time 
lock stealing can happen before it is blocked. I would like to allow 
sufficient number of lock stealing to minimize the performance impact of 
this change.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-11 20:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 13:59 [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters Waiman Long
2022-07-06 14:03 ` Waiman Long
2022-07-07 18:22   ` Boqun Feng
2022-07-07 18:45     ` Waiman Long
2022-07-07 19:04       ` Boqun Feng
2022-07-07 19:25         ` Waiman Long
2022-07-11  9:34 ` Peter Zijlstra
2022-07-11 20:02   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.