* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 1:01 ` Steven Rostedt
@ 2023-03-03 18:11 ` Joel Fernandes
2023-03-03 18:37 ` Steven Rostedt
2023-03-07 14:08 ` Peter Zijlstra
2023-03-06 18:30 ` John Stultz
2023-03-08 1:31 ` Steven Rostedt
2 siblings, 2 replies; 32+ messages in thread
From: Joel Fernandes @ 2023-03-03 18:11 UTC (permalink / raw)
To: Steven Rostedt
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior, joel
Hey Steve,
On Thu, Mar 02, 2023 at 08:01:36PM -0500, Steven Rostedt wrote:
> On Thu, 2 Mar 2023 16:56:13 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Knowing that rcu_read_lock() keeps the tasks safe, I made the optimization
> > to only grab the spinlock (and disable interrupts) once, or whenever the
> > top waiter changes.
>
> v3 as I found that there were too places to test for top waiter that had to
> be removed:
Nice patch!!! One question below:
> (I took out the trace_printk() here).
>
> -- Steve
>
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 010cf4e6d0b8..283dd8e654ef 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1399,8 +1399,12 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
> struct rt_mutex_waiter *waiter,
> struct task_struct *owner)
> {
> + struct rt_mutex_waiter *top_waiter;
> + struct rt_mutex_waiter *last_waiter = NULL;
> + struct task_struct *top_task = NULL;
> bool res = true;
>
> + /* rcu_read_lock keeps task_structs around */
> rcu_read_lock();
> for (;;) {
> /* If owner changed, trylock again. */
> @@ -1421,11 +1425,23 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
> * for CONFIG_PREEMPT_RCU=y)
> * - the VCPU on which owner runs is preempted
> */
> - if (!owner_on_cpu(owner) || need_resched() ||
> - !rt_mutex_waiter_is_top_waiter(lock, waiter)) {
> + if (!owner_on_cpu(owner) || need_resched()) {
> res = false;
> break;
> }
> + top_waiter = rt_mutex_top_waiter(lock);
> + if (top_waiter != waiter) {
> + if (top_waiter != last_waiter) {
> + raw_spin_lock_irq(&lock->wait_lock);
> + last_waiter = rt_mutex_top_waiter(lock);
> + top_task = last_waiter->task;
> + raw_spin_unlock_irq(&lock->wait_lock);
> + }
> + if (!owner_on_cpu(top_task)) {
> + res = false;
> + break;
> + }
> + }
In the normal mutex's adaptive spinning, there is no check for if there is a
change in waiter AFAICS (ignoring ww mutex stuff for a second).
I can see one may want to do that waiter-check, as spinning
indefinitely if the lock owner is on the CPU for too long may result in
excessing power burn. But normal mutex does not seem to do that.
What makes the rtmutex spin logic different from normal mutex in this
scenario, so that rtmutex wants to do that but normal ones dont?
Another thought is, I am wondering if all of them spinning indefinitely might
be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
may be even harmful as you are disabling interrupts in the process.
Either way, I think a comment should go on top of the "if (top_waiter !=
waiter)" check IMO.
thanks,
- Joel
> cpu_relax();
> }
> rcu_read_unlock();
> @@ -1547,10 +1563,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_mutex_base *lock,
> break;
> }
>
> - if (waiter == rt_mutex_top_waiter(lock))
> - owner = rt_mutex_owner(lock);
> - else
> - owner = NULL;
> + owner = rt_mutex_owner(lock);
> raw_spin_unlock_irq(&lock->wait_lock);
>
> if (!owner || !rtmutex_spin_on_owner(lock, waiter, owner))
> @@ -1736,10 +1749,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
> if (try_to_take_rt_mutex(lock, current, &waiter))
> break;
>
> - if (&waiter == rt_mutex_top_waiter(lock))
> - owner = rt_mutex_owner(lock);
> - else
> - owner = NULL;
> + owner = rt_mutex_owner(lock);
> raw_spin_unlock_irq(&lock->wait_lock);
>
> if (!owner || !rtmutex_spin_on_owner(lock, &waiter, owner))
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 18:11 ` Joel Fernandes
@ 2023-03-03 18:37 ` Steven Rostedt
2023-03-03 19:25 ` Joel Fernandes
2023-03-07 14:08 ` Peter Zijlstra
1 sibling, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2023-03-03 18:37 UTC (permalink / raw)
To: Joel Fernandes
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior
On Fri, 3 Mar 2023 18:11:34 +0000
Joel Fernandes <joel@joelfernandes.org> wrote:
> In the normal mutex's adaptive spinning, there is no check for if there is a
> change in waiter AFAICS (ignoring ww mutex stuff for a second).
>
> I can see one may want to do that waiter-check, as spinning
> indefinitely if the lock owner is on the CPU for too long may result in
> excessing power burn. But normal mutex does not seem to do that.
>
> What makes the rtmutex spin logic different from normal mutex in this
> scenario, so that rtmutex wants to do that but normal ones dont?
Well, the point of the patch is that I don't think they should be different
;-)
>
> Another thought is, I am wondering if all of them spinning indefinitely might
> be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> may be even harmful as you are disabling interrupts in the process.
The last patch only does the interrupt disabling if the top waiter changes.
Which in practice is seldom.
But, I don't think a non top waiter should spin if the top waiter is not
running. The top waiter is the one that will get the lock next. If the
owner releases the lock and gives it to the top waiter, then it has to go
through the wake up of the top waiter. I don't see why a task should spin
to save a wake up if a wake up has to happen anyway.
>
> Either way, I think a comment should go on top of the "if (top_waiter !=
> waiter)" check IMO.
What type of comment?
-- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 18:37 ` Steven Rostedt
@ 2023-03-03 19:25 ` Joel Fernandes
2023-03-03 19:38 ` Steven Rostedt
0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2023-03-03 19:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior
On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 3 Mar 2023 18:11:34 +0000
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > In the normal mutex's adaptive spinning, there is no check for if there is a
> > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> >
> > I can see one may want to do that waiter-check, as spinning
> > indefinitely if the lock owner is on the CPU for too long may result in
> > excessing power burn. But normal mutex does not seem to do that.
> >
> > What makes the rtmutex spin logic different from normal mutex in this
> > scenario, so that rtmutex wants to do that but normal ones dont?
>
> Well, the point of the patch is that I don't think they should be different
> ;-)
But there's no "waiter change" thing for mutex_spin_on_owner right.
Then, should mutex_spin_on_owner() also add a call to
__mutex_waiter_is_first() ?
> > Another thought is, I am wondering if all of them spinning indefinitely might
> > be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> > adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> > may be even harmful as you are disabling interrupts in the process.
>
> The last patch only does the interrupt disabling if the top waiter changes.
> Which in practice is seldom.
>
> But, I don't think a non top waiter should spin if the top waiter is not
> running. The top waiter is the one that will get the lock next. If the
> owner releases the lock and gives it to the top waiter, then it has to go
> through the wake up of the top waiter.
Correct me if I'm wrong, but I don't think it will go through
schedule() after spinning, which is what adds the overhead for John.
> I don't see why a task should spin
> to save a wake up if a wake up has to happen anyway.
What about regular mutexes, happens there too or no?
> > Either way, I think a comment should go on top of the "if (top_waiter !=
> > waiter)" check IMO.
>
> What type of comment?
Comment explaining why "if (top_waiter != waiter)" is essential :-).
thanks,
-Joel
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 19:25 ` Joel Fernandes
@ 2023-03-03 19:38 ` Steven Rostedt
2023-03-03 20:36 ` Qais Yousef
2023-03-04 3:01 ` Joel Fernandes
0 siblings, 2 replies; 32+ messages in thread
From: Steven Rostedt @ 2023-03-03 19:38 UTC (permalink / raw)
To: Joel Fernandes
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior
On Fri, 3 Mar 2023 14:25:23 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:
> On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 3 Mar 2023 18:11:34 +0000
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > > In the normal mutex's adaptive spinning, there is no check for if there is a
> > > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> > >
> > > I can see one may want to do that waiter-check, as spinning
> > > indefinitely if the lock owner is on the CPU for too long may result in
> > > excessing power burn. But normal mutex does not seem to do that.
> > >
> > > What makes the rtmutex spin logic different from normal mutex in this
> > > scenario, so that rtmutex wants to do that but normal ones dont?
> >
> > Well, the point of the patch is that I don't think they should be different
> > ;-)
>
> But there's no "waiter change" thing for mutex_spin_on_owner right.
>
> Then, should mutex_spin_on_owner() also add a call to
> __mutex_waiter_is_first() ?
Ah interesting, I missed the __mutex_waiter_is_first() in the mutex code,
where it looks to do basically the same thing as rt_mutex (but slightly
different). From looking at this, it appears that mutex() has FIFO fair
logic, where the second waiter will sleep.
Would be interesting to see why John sees such a huge difference between
normal mutex and rtmutex if they are doing the same thing. One thing is
perhaps the priority logic is causing the issue, where this will not
improve anything.
I wonder if we add spinning to normal mutex for the other waiters if that
would improve things or make them worse?
>
> > > Another thought is, I am wondering if all of them spinning indefinitely might
> > > be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> > > adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> > > may be even harmful as you are disabling interrupts in the process.
> >
> > The last patch only does the interrupt disabling if the top waiter changes.
> > Which in practice is seldom.
> >
> > But, I don't think a non top waiter should spin if the top waiter is not
> > running. The top waiter is the one that will get the lock next. If the
> > owner releases the lock and gives it to the top waiter, then it has to go
> > through the wake up of the top waiter.
>
> Correct me if I'm wrong, but I don't think it will go through
> schedule() after spinning, which is what adds the overhead for John.
Only if the lock becomes free.
>
> > I don't see why a task should spin
> > to save a wake up if a wake up has to happen anyway.
>
> What about regular mutexes, happens there too or no?
Yes, but in a FIFO order, where in rt_mutex, a second, higher priority task
can make the first ones sleep. So maybe it's just the priority logic that
is causing the issues.
>
> > > Either way, I think a comment should go on top of the "if (top_waiter !=
> > > waiter)" check IMO.
> >
> > What type of comment?
>
> Comment explaining why "if (top_waiter != waiter)" is essential :-).
You mean, "Don't spin if the next owner is not on any CPU"?
-- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 19:38 ` Steven Rostedt
@ 2023-03-03 20:36 ` Qais Yousef
2023-03-04 3:21 ` Joel Fernandes
2023-03-04 3:01 ` Joel Fernandes
1 sibling, 1 reply; 32+ messages in thread
From: Qais Yousef @ 2023-03-03 20:36 UTC (permalink / raw)
To: Steven Rostedt
Cc: Joel Fernandes, John Stultz, LKML, Wei Wang, Midas Chien,
Kees Cook, Anton Vorontsov, Guilherme G. Piccoli, Tony Luck,
kernel-team, Thomas Gleixner, Peter Zijlstra,
Sebastian Andrzej Siewior
On 03/03/23 14:38, Steven Rostedt wrote:
> On Fri, 3 Mar 2023 14:25:23 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Fri, 3 Mar 2023 18:11:34 +0000
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > > In the normal mutex's adaptive spinning, there is no check for if there is a
> > > > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> > > >
> > > > I can see one may want to do that waiter-check, as spinning
> > > > indefinitely if the lock owner is on the CPU for too long may result in
> > > > excessing power burn. But normal mutex does not seem to do that.
> > > >
> > > > What makes the rtmutex spin logic different from normal mutex in this
> > > > scenario, so that rtmutex wants to do that but normal ones dont?
> > >
> > > Well, the point of the patch is that I don't think they should be different
> > > ;-)
> >
> > But there's no "waiter change" thing for mutex_spin_on_owner right.
> >
> > Then, should mutex_spin_on_owner() also add a call to
> > __mutex_waiter_is_first() ?
>
> Ah interesting, I missed the __mutex_waiter_is_first() in the mutex code,
> where it looks to do basically the same thing as rt_mutex (but slightly
> different). From looking at this, it appears that mutex() has FIFO fair
> logic, where the second waiter will sleep.
>
> Would be interesting to see why John sees such a huge difference between
> normal mutex and rtmutex if they are doing the same thing. One thing is
> perhaps the priority logic is causing the issue, where this will not
> improve anything.
I think that can be a good suspect. If the waiters are RT tasks the root cause
might be starvation issue due to bad priority setup and moving to FIFO just
happens to hide it.
For same priority RT tasks, we should behave as FIFO too AFAICS.
If there are a mix of RT vs CFS; RT will always win of course.
>
> I wonder if we add spinning to normal mutex for the other waiters if that
> would improve things or make them worse?
I see a potential risk depending on how long the worst case scenario for this
optimistic spinning.
RT tasks can prevent all lower priority RT and CFS from running.
CFS tasks will lose some precious bandwidth from their sched_slice() as this
will be accounted for them as RUNNING time even if they were effectively
waiting.
Cheers
--
Qais Yousef
>
> >
> > > > Another thought is, I am wondering if all of them spinning indefinitely might
> > > > be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> > > > adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> > > > may be even harmful as you are disabling interrupts in the process.
> > >
> > > The last patch only does the interrupt disabling if the top waiter changes.
> > > Which in practice is seldom.
> > >
> > > But, I don't think a non top waiter should spin if the top waiter is not
> > > running. The top waiter is the one that will get the lock next. If the
> > > owner releases the lock and gives it to the top waiter, then it has to go
> > > through the wake up of the top waiter.
> >
> > Correct me if I'm wrong, but I don't think it will go through
> > schedule() after spinning, which is what adds the overhead for John.
>
> Only if the lock becomes free.
>
> >
> > > I don't see why a task should spin
> > > to save a wake up if a wake up has to happen anyway.
> >
> > What about regular mutexes, happens there too or no?
>
> Yes, but in a FIFO order, where in rt_mutex, a second, higher priority task
> can make the first ones sleep. So maybe it's just the priority logic that
> is causing the issues.
>
> >
> > > > Either way, I think a comment should go on top of the "if (top_waiter !=
> > > > waiter)" check IMO.
> > >
> > > What type of comment?
> >
> > Comment explaining why "if (top_waiter != waiter)" is essential :-).
>
> You mean, "Don't spin if the next owner is not on any CPU"?
>
> -- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 20:36 ` Qais Yousef
@ 2023-03-04 3:21 ` Joel Fernandes
2023-03-06 19:19 ` Qais Yousef
0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2023-03-04 3:21 UTC (permalink / raw)
To: Qais Yousef
Cc: Steven Rostedt, John Stultz, LKML, Wei Wang, Midas Chien,
Kees Cook, Anton Vorontsov, Guilherme G. Piccoli, Tony Luck,
kernel-team, Thomas Gleixner, Peter Zijlstra,
Sebastian Andrzej Siewior
On Fri, Mar 03, 2023 at 08:36:45PM +0000, Qais Yousef wrote:
> On 03/03/23 14:38, Steven Rostedt wrote:
> > On Fri, 3 Mar 2023 14:25:23 -0500
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > > On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > On Fri, 3 Mar 2023 18:11:34 +0000
> > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > > In the normal mutex's adaptive spinning, there is no check for if there is a
> > > > > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> > > > >
> > > > > I can see one may want to do that waiter-check, as spinning
> > > > > indefinitely if the lock owner is on the CPU for too long may result in
> > > > > excessing power burn. But normal mutex does not seem to do that.
> > > > >
> > > > > What makes the rtmutex spin logic different from normal mutex in this
> > > > > scenario, so that rtmutex wants to do that but normal ones dont?
> > > >
> > > > Well, the point of the patch is that I don't think they should be different
> > > > ;-)
> > >
> > > But there's no "waiter change" thing for mutex_spin_on_owner right.
> > >
> > > Then, should mutex_spin_on_owner() also add a call to
> > > __mutex_waiter_is_first() ?
> >
> > Ah interesting, I missed the __mutex_waiter_is_first() in the mutex code,
> > where it looks to do basically the same thing as rt_mutex (but slightly
> > different). From looking at this, it appears that mutex() has FIFO fair
> > logic, where the second waiter will sleep.
> >
> > Would be interesting to see why John sees such a huge difference between
> > normal mutex and rtmutex if they are doing the same thing. One thing is
> > perhaps the priority logic is causing the issue, where this will not
> > improve anything.
>
> I think that can be a good suspect. If the waiters are RT tasks the root cause
> might be starvation issue due to bad priority setup and moving to FIFO just
> happens to hide it.
I wonder if mutex should actually prioritize giving the lock to RT tasks
instead of FIFO, since that's higher priority work. It sounds that's more
'fair'. But that's likely to make John's issue worse.
> For same priority RT tasks, we should behave as FIFO too AFAICS.
>
> If there are a mix of RT vs CFS; RT will always win of course.
>
> >
> > I wonder if we add spinning to normal mutex for the other waiters if that
> > would improve things or make them worse?
>
> I see a potential risk depending on how long the worst case scenario for this
> optimistic spinning.
>
> RT tasks can prevent all lower priority RT and CFS from running.
Agree, I was kind of hoping need_resched() in mutex_spin_on_owner() would
come to the rescue in such a scenario, but obviously not. Modifications to
check_preempt_curr_rt() could obviously aid there but...
> CFS tasks will lose some precious bandwidth from their sched_slice() as this
> will be accounted for them as RUNNING time even if they were effectively
> waiting.
True, but maybe the CFS task is happy to lose some bandwidth and get back to
CPU quickly, than blocking and not getting any work done. ;-)
thanks,
- Joel
>
>
> Cheers
>
> --
> Qais Yousef
>
> >
> > >
> > > > > Another thought is, I am wondering if all of them spinning indefinitely might
> > > > > be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> > > > > adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> > > > > may be even harmful as you are disabling interrupts in the process.
> > > >
> > > > The last patch only does the interrupt disabling if the top waiter changes.
> > > > Which in practice is seldom.
> > > >
> > > > But, I don't think a non top waiter should spin if the top waiter is not
> > > > running. The top waiter is the one that will get the lock next. If the
> > > > owner releases the lock and gives it to the top waiter, then it has to go
> > > > through the wake up of the top waiter.
> > >
> > > Correct me if I'm wrong, but I don't think it will go through
> > > schedule() after spinning, which is what adds the overhead for John.
> >
> > Only if the lock becomes free.
> >
> > >
> > > > I don't see why a task should spin
> > > > to save a wake up if a wake up has to happen anyway.
> > >
> > > What about regular mutexes, happens there too or no?
> >
> > Yes, but in a FIFO order, where in rt_mutex, a second, higher priority task
> > can make the first ones sleep. So maybe it's just the priority logic that
> > is causing the issues.
> >
> > >
> > > > > Either way, I think a comment should go on top of the "if (top_waiter !=
> > > > > waiter)" check IMO.
> > > >
> > > > What type of comment?
> > >
> > > Comment explaining why "if (top_waiter != waiter)" is essential :-).
> >
> > You mean, "Don't spin if the next owner is not on any CPU"?
> >
> > -- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-04 3:21 ` Joel Fernandes
@ 2023-03-06 19:19 ` Qais Yousef
0 siblings, 0 replies; 32+ messages in thread
From: Qais Yousef @ 2023-03-06 19:19 UTC (permalink / raw)
To: Joel Fernandes
Cc: Steven Rostedt, John Stultz, LKML, Wei Wang, Midas Chien,
Kees Cook, Anton Vorontsov, Guilherme G. Piccoli, Tony Luck,
kernel-team, Thomas Gleixner, Peter Zijlstra,
Sebastian Andrzej Siewior
On 03/04/23 03:21, Joel Fernandes wrote:
> On Fri, Mar 03, 2023 at 08:36:45PM +0000, Qais Yousef wrote:
> > On 03/03/23 14:38, Steven Rostedt wrote:
> > > On Fri, 3 Mar 2023 14:25:23 -0500
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > > On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > >
> > > > > On Fri, 3 Mar 2023 18:11:34 +0000
> > > > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > >
> > > > > > In the normal mutex's adaptive spinning, there is no check for if there is a
> > > > > > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> > > > > >
> > > > > > I can see one may want to do that waiter-check, as spinning
> > > > > > indefinitely if the lock owner is on the CPU for too long may result in
> > > > > > excessing power burn. But normal mutex does not seem to do that.
> > > > > >
> > > > > > What makes the rtmutex spin logic different from normal mutex in this
> > > > > > scenario, so that rtmutex wants to do that but normal ones dont?
> > > > >
> > > > > Well, the point of the patch is that I don't think they should be different
> > > > > ;-)
> > > >
> > > > But there's no "waiter change" thing for mutex_spin_on_owner right.
> > > >
> > > > Then, should mutex_spin_on_owner() also add a call to
> > > > __mutex_waiter_is_first() ?
> > >
> > > Ah interesting, I missed the __mutex_waiter_is_first() in the mutex code,
> > > where it looks to do basically the same thing as rt_mutex (but slightly
> > > different). From looking at this, it appears that mutex() has FIFO fair
> > > logic, where the second waiter will sleep.
> > >
> > > Would be interesting to see why John sees such a huge difference between
> > > normal mutex and rtmutex if they are doing the same thing. One thing is
> > > perhaps the priority logic is causing the issue, where this will not
> > > improve anything.
> >
> > I think that can be a good suspect. If the waiters are RT tasks the root cause
> > might be starvation issue due to bad priority setup and moving to FIFO just
> > happens to hide it.
>
> I wonder if mutex should actually prioritize giving the lock to RT tasks
> instead of FIFO, since that's higher priority work. It sounds that's more
> 'fair'. But that's likely to make John's issue worse.
It is the right thing to do IMHO, but I guess the implications are just too
hard to tell to enforce it by default yet. Which is I guess why it's all
protected by PREEMPT_RT still.
(I'm not sure but I assumed that logically PREEMPT_RT would convert all mutex
to rt_mutexes by default)
>
> > For same priority RT tasks, we should behave as FIFO too AFAICS.
> >
> > If there are a mix of RT vs CFS; RT will always win of course.
> >
> > >
> > > I wonder if we add spinning to normal mutex for the other waiters if that
> > > would improve things or make them worse?
> >
> > I see a potential risk depending on how long the worst case scenario for this
> > optimistic spinning.
> >
> > RT tasks can prevent all lower priority RT and CFS from running.
>
> Agree, I was kind of hoping need_resched() in mutex_spin_on_owner() would
> come to the rescue in such a scenario, but obviously not. Modifications to
> check_preempt_curr_rt() could obviously aid there but...
>
> > CFS tasks will lose some precious bandwidth from their sched_slice() as this
> > will be accounted for them as RUNNING time even if they were effectively
> > waiting.
>
> True, but maybe the CFS task is happy to lose some bandwidth and get back to
> CPU quickly, than blocking and not getting any work done. ;-)
It depends on the worst case scenario of spinning. If we can ensure it's
bounded to something small, then yeah I don't see an issue :-)
Cheers
--
Qais Yousef
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 19:38 ` Steven Rostedt
2023-03-03 20:36 ` Qais Yousef
@ 2023-03-04 3:01 ` Joel Fernandes
2023-03-04 3:23 ` Joel Fernandes
1 sibling, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2023-03-04 3:01 UTC (permalink / raw)
To: Steven Rostedt
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior
Hey Steve,
On Fri, Mar 03, 2023 at 02:38:22PM -0500, Steven Rostedt wrote:
> On Fri, 3 Mar 2023 14:25:23 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > On Fri, Mar 3, 2023 at 1:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Fri, 3 Mar 2023 18:11:34 +0000
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > > In the normal mutex's adaptive spinning, there is no check for if there is a
> > > > change in waiter AFAICS (ignoring ww mutex stuff for a second).
> > > >
> > > > I can see one may want to do that waiter-check, as spinning
> > > > indefinitely if the lock owner is on the CPU for too long may result in
> > > > excessing power burn. But normal mutex does not seem to do that.
> > > >
> > > > What makes the rtmutex spin logic different from normal mutex in this
> > > > scenario, so that rtmutex wants to do that but normal ones dont?
> > >
> > > Well, the point of the patch is that I don't think they should be different
> > > ;-)
> >
> > But there's no "waiter change" thing for mutex_spin_on_owner right.
> >
> > Then, should mutex_spin_on_owner() also add a call to
> > __mutex_waiter_is_first() ?
>
> Ah interesting, I missed the __mutex_waiter_is_first() in the mutex code,
> where it looks to do basically the same thing as rt_mutex (but slightly
> different).
True, I concur!
> From looking at this, it appears that mutex() has FIFO fair
> logic, where the second waiter will sleep.
>
> Would be interesting to see why John sees such a huge difference between
> normal mutex and rtmutex if they are doing the same thing. One thing is
> perhaps the priority logic is causing the issue, where this will not
> improve anything.
> I wonder if we add spinning to normal mutex for the other waiters if that
> would improve things or make them worse?
Yeah it could improve things (or make them worse ;-). In that approach, I
guess those other waiters will not be spinning for too long once the first
waiter gets the lock, as mutex_spin_on_owner() it will break out of the spin:
while (__mutex_owner(lock) == owner) {
...
}
But yeah it sounds like a plausible idea.
> > > > Another thought is, I am wondering if all of them spinning indefinitely might
> > > > be Ok for rtmutex as well, since as you mentioned, preemption is enabled. So
> > > > adding the if (top_waiter != last_waiter) {...} might be unnecessary? In fact
> > > > may be even harmful as you are disabling interrupts in the process.
> > >
> > > The last patch only does the interrupt disabling if the top waiter changes.
> > > Which in practice is seldom.
> > >
> > > But, I don't think a non top waiter should spin if the top waiter is not
> > > running. The top waiter is the one that will get the lock next. If the
> > > owner releases the lock and gives it to the top waiter, then it has to go
> > > through the wake up of the top waiter.
Ah ok. I see what you're doing now!
I guess that check will help whenever the top-waiter keeps changing. In that
case you want only the latest top-waiter as the spinner, all while the lock
owner is not changing.
> > Correct me if I'm wrong, but I don't think it will go through
> > schedule() after spinning, which is what adds the overhead for John.
>
> Only if the lock becomes free.
Ah yeah, true!
> > > I don't see why a task should spin
> > > to save a wake up if a wake up has to happen anyway.
> >
> > What about regular mutexes, happens there too or no?
>
> Yes, but in a FIFO order, where in rt_mutex, a second, higher priority task
> can make the first ones sleep. So maybe it's just the priority logic that
> is causing the issues.
True! So in the FIFO thing, there's no way a high prio task can get ahead in
line of the first waiter, makes sense.
> > > > Either way, I think a comment should go on top of the "if (top_waiter !=
> > > > waiter)" check IMO.
> > >
> > > What type of comment?
> >
> > Comment explaining why "if (top_waiter != waiter)" is essential :-).
>
Maybe "/* Only the top waiter needs to spin. If we are no longer the
top-waiter, no point in spinning, as we do not get the lock next anyway. */"
?
thanks,
- Joel
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-04 3:01 ` Joel Fernandes
@ 2023-03-04 3:23 ` Joel Fernandes
0 siblings, 0 replies; 32+ messages in thread
From: Joel Fernandes @ 2023-03-04 3:23 UTC (permalink / raw)
To: Steven Rostedt
Cc: John Stultz, LKML, Wei Wang, Midas Chien, Kees Cook,
Anton Vorontsov, Guilherme G. Piccoli, Tony Luck, kernel-team,
Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior
On Sat, Mar 04, 2023 at 03:01:30AM +0000, Joel Fernandes wrote:
[...]
> > > > > Either way, I think a comment should go on top of the "if (top_waiter !=
> > > > > waiter)" check IMO.
> > > >
> > > > What type of comment?
> > >
> > > Comment explaining why "if (top_waiter != waiter)" is essential :-).
> >
>
> Maybe "/* Only the top waiter needs to spin. If we are no longer the
> top-waiter, no point in spinning, as we do not get the lock next anyway. */"
>
> ?
And it could be added to that comment that, we want to continue spinning as
long as the top-waiter is still on the CPU (even if we are no longer the
top-waiter).
thanks,
- Joel
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 18:11 ` Joel Fernandes
2023-03-03 18:37 ` Steven Rostedt
@ 2023-03-07 14:08 ` Peter Zijlstra
2023-03-07 20:19 ` Joel Fernandes
1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2023-03-07 14:08 UTC (permalink / raw)
To: Joel Fernandes
Cc: Steven Rostedt, John Stultz, LKML, Wei Wang, Midas Chien,
Kees Cook, Anton Vorontsov, Guilherme G. Piccoli, Tony Luck,
kernel-team, Thomas Gleixner, Sebastian Andrzej Siewior
On Fri, Mar 03, 2023 at 06:11:34PM +0000, Joel Fernandes wrote:
> What makes the rtmutex spin logic different from normal mutex in this
> scenario, so that rtmutex wants to do that but normal ones dont?
Regular mutex uses osq 'lock' to serialize waiters and only the top
spinner gets to spin on the mutex itself, this greatly reduces the
contention on the mutex.
OSQ is FIFO, which is not what RT-mutex needs.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-07 14:08 ` Peter Zijlstra
@ 2023-03-07 20:19 ` Joel Fernandes
0 siblings, 0 replies; 32+ messages in thread
From: Joel Fernandes @ 2023-03-07 20:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, John Stultz, LKML, Wei Wang, Midas Chien,
Kees Cook, Anton Vorontsov, Guilherme G. Piccoli, Tony Luck,
kernel-team, Thomas Gleixner, Sebastian Andrzej Siewior
On Tue, Mar 7, 2023 at 9:09 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 03, 2023 at 06:11:34PM +0000, Joel Fernandes wrote:
> > What makes the rtmutex spin logic different from normal mutex in this
> > scenario, so that rtmutex wants to do that but normal ones dont?
>
> Regular mutex uses osq 'lock' to serialize waiters and only the top
> spinner gets to spin on the mutex itself, this greatly reduces the
> contention on the mutex.
>
> OSQ is FIFO, which is not what RT-mutex needs.
Got it, so basically OSQ ensures fairness as its FIFO and also reduces
lock contention because I am guessing the OSQ-spinners are spinning on
a per-spinner MCS node (that per-CPU optimistic_spin_node it appears).
This makes perfect sense now, thank you Peter!!!
- Joel
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 1:01 ` Steven Rostedt
2023-03-03 18:11 ` Joel Fernandes
@ 2023-03-06 18:30 ` John Stultz
2023-03-08 1:31 ` Steven Rostedt
2 siblings, 0 replies; 32+ messages in thread
From: John Stultz @ 2023-03-06 18:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Wei Wang, Midas Chien, Kees Cook, Anton Vorontsov,
Guilherme G. Piccoli, Tony Luck, kernel-team, Thomas Gleixner,
Peter Zijlstra, Sebastian Andrzej Siewior
On Thu, Mar 2, 2023 at 5:01 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2 Mar 2023 16:56:13 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Knowing that rcu_read_lock() keeps the tasks safe, I made the optimization
> > to only grab the spinlock (and disable interrupts) once, or whenever the
> > top waiter changes.
>
> v3 as I found that there were too places to test for top waiter that had to
> be removed:
Hey Steven,
Unfortunately previous versions didn't improve the situation for the
reporter, and this version is causing crashes (BUG at
kernel/locking/rtmutex_common.h:118!) for them.
I'm going to spend some time today to get a local reproducer so I can
tinker a bit here as well.
thanks
-john
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-03 1:01 ` Steven Rostedt
2023-03-03 18:11 ` Joel Fernandes
2023-03-06 18:30 ` John Stultz
@ 2023-03-08 1:31 ` Steven Rostedt
2023-03-08 20:04 ` John Stultz
2 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2023-03-08 1:31 UTC (permalink / raw)
To: John Stultz
Cc: LKML, Wei Wang, Midas Chien, Kees Cook, Anton Vorontsov,
Guilherme G. Piccoli, Tony Luck, kernel-team, Thomas Gleixner,
Peter Zijlstra, Sebastian Andrzej Siewior
On Thu, 2 Mar 2023 20:01:36 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> @@ -1421,11 +1425,23 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
> * for CONFIG_PREEMPT_RCU=y)
> * - the VCPU on which owner runs is preempted
> */
> - if (!owner_on_cpu(owner) || need_resched() ||
> - !rt_mutex_waiter_is_top_waiter(lock, waiter)) {
> + if (!owner_on_cpu(owner) || need_resched()) {
> res = false;
> break;
> }
> + top_waiter = rt_mutex_top_waiter(lock);
rt_mutex_top_waiter() can not be called outside the wait_lock, as it may
trigger that BUG_ON() you saw.
New patch below.
> + if (top_waiter != waiter) {
> + if (top_waiter != last_waiter) {
> + raw_spin_lock_irq(&lock->wait_lock);
> + last_waiter = rt_mutex_top_waiter(lock);
> + top_task = last_waiter->task;
> + raw_spin_unlock_irq(&lock->wait_lock);
> + }
> + if (!owner_on_cpu(top_task)) {
> + res = false;
> + break;
> + }
> + }
-- Steve
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8abc2a..f7b0cc3be20e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1362,8 +1362,11 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
struct rt_mutex_waiter *waiter,
struct task_struct *owner)
{
+ struct rt_mutex_waiter *top_waiter = NULL;
+ struct task_struct *top_task = NULL;
bool res = true;
+ /* rcu_read_lock keeps task_structs around */
rcu_read_lock();
for (;;) {
/* If owner changed, trylock again. */
@@ -1384,11 +1387,22 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
* for CONFIG_PREEMPT_RCU=y)
* - the VCPU on which owner runs is preempted
*/
- if (!owner_on_cpu(owner) || need_resched() ||
- !rt_mutex_waiter_is_top_waiter(lock, waiter)) {
+ if (!owner_on_cpu(owner) || need_resched()) {
res = false;
break;
}
+ if (!rt_mutex_waiter_is_top_waiter(lock, waiter)) {
+ if (!rt_mutex_waiter_is_top_waiter(lock, top_waiter)) {
+ raw_spin_lock_irq(&lock->wait_lock);
+ top_waiter = rt_mutex_top_waiter(lock);
+ top_task = top_waiter->task;
+ raw_spin_unlock_irq(&lock->wait_lock);
+ }
+ if (!owner_on_cpu(top_task)) {
+ res = false;
+ break;
+ }
+ }
cpu_relax();
}
rcu_read_unlock();
@@ -1510,10 +1524,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_mutex_base *lock,
break;
}
- if (waiter == rt_mutex_top_waiter(lock))
- owner = rt_mutex_owner(lock);
- else
- owner = NULL;
+ owner = rt_mutex_owner(lock);
raw_spin_unlock_irq(&lock->wait_lock);
if (!owner || !rtmutex_spin_on_owner(lock, waiter, owner))
@@ -1699,10 +1710,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
if (try_to_take_rt_mutex(lock, current, &waiter))
break;
- if (&waiter == rt_mutex_top_waiter(lock))
- owner = rt_mutex_owner(lock);
- else
- owner = NULL;
+ owner = rt_mutex_owner(lock);
raw_spin_unlock_irq(&lock->wait_lock);
if (!owner || !rtmutex_spin_on_owner(lock, &waiter, owner))
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-08 1:31 ` Steven Rostedt
@ 2023-03-08 20:04 ` John Stultz
2023-03-08 20:41 ` Steven Rostedt
0 siblings, 1 reply; 32+ messages in thread
From: John Stultz @ 2023-03-08 20:04 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Wei Wang, Midas Chien, Kees Cook, Anton Vorontsov,
Guilherme G. Piccoli, Tony Luck, kernel-team, Thomas Gleixner,
Peter Zijlstra, Sebastian Andrzej Siewior
On Tue, Mar 7, 2023 at 5:31 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 2 Mar 2023 20:01:36 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > @@ -1421,11 +1425,23 @@ static bool rtmutex_spin_on_owner(struct rt_mutex_base *lock,
> > * for CONFIG_PREEMPT_RCU=y)
> > * - the VCPU on which owner runs is preempted
> > */
> > - if (!owner_on_cpu(owner) || need_resched() ||
> > - !rt_mutex_waiter_is_top_waiter(lock, waiter)) {
> > + if (!owner_on_cpu(owner) || need_resched()) {
> > res = false;
> > break;
> > }
> > + top_waiter = rt_mutex_top_waiter(lock);
>
> rt_mutex_top_waiter() can not be called outside the wait_lock, as it may
> trigger that BUG_ON() you saw.
>
> New patch below.
Hey Steven!
Thanks for the new version! It avoids the crash issue. However, with
my sef-created reproducer, I was still seeing similar regression going
between mutex to rtmutex.
I'm interested in continuing to see if we can further tweak it, but
I've got some other work I need to focus on, so I think I'm going to
advocate for the revert in the short-term and look at finer grained
locking (along with rtmutex to address the priority inversion issue)
in the longer term.
thanks
-john
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] pstore: Revert pmsg_lock back to a normal mutex
2023-03-08 20:04 ` John Stultz
@ 2023-03-08 20:41 ` Steven Rostedt
0 siblings, 0 replies; 32+ messages in thread
From: Steven Rostedt @ 2023-03-08 20:41 UTC (permalink / raw)
To: John Stultz
Cc: LKML, Wei Wang, Midas Chien, Kees Cook, Anton Vorontsov,
Guilherme G. Piccoli, Tony Luck, kernel-team, Thomas Gleixner,
Peter Zijlstra, Sebastian Andrzej Siewior
On Wed, 8 Mar 2023 12:04:23 -0800
John Stultz <jstultz@google.com> wrote:
> I'm interested in continuing to see if we can further tweak it, but
> I've got some other work I need to focus on, so I think I'm going to
> advocate for the revert in the short-term and look at finer grained
> locking (along with rtmutex to address the priority inversion issue)
> in the longer term.
Yeah, I would suggest the same. I would still like to see what the
difference is. Because I believe this will also be an issue for PREEMPT_RT.
-- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread