All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-07  9:18 [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged Ran Wang
@ 2021-01-07  9:13 ` Ran Wang
  2021-01-07 10:45 ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Ran Wang @ 2021-01-07  9:13 UTC (permalink / raw)
  To: Ran Wang, Sebastian Siewior, Thomas Gleixner
  Cc: Jiafei Pan, linux-rt-users, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, linux-kernel

Hi,

On Thursday, January 7, 2021 5:19 PM, Ran Wang wrote:
> 
> When doing CPU un-plug stress test, function smpboot_park_threads() would get call to park kernel threads (which including ksoftirqd) on
> that CPU core, and function wait_task_inactive() would yield for those queued
> task(s) by calling schedule_hrtimerout() with mode of HRTIMER_MODE_REL.
> 
> stack trace:
> ...
> smpboot_thread_fn
>     cpuhp_thread_fun
>         cpuhp_invoke_callback
>             smpboot_park_threads
>               smpboot_park_thread: ksoftirqd/1
>                 kthread_park
>                   wait_task_inactive
>                      schedule_hrtimerout
> 
> However, when PREEMPT_RT is set, this would cause a pending issue since
> schedule_hrtimerout() depend on thread ksoftirqd to complete related work if it using HRTIMER_MODE_SOFT. So force using
> HRTIMER_MODE_HARD in such case.

This issue was observed on LX2160ARDB (arm64, 16 A72 cores) when selecting PREEMPT_RT, 
non-RT kernel works fine.And I could verify that fix on both linux-5.6.y-rt and linux-5.4.y-rt.
But for linux-5.9.y-rt and linux-5.10.y-rt, looks there are other issues which blocking
verification currently. Below is the steps for issue reproducing:

1. Kernel menuconfig:
CONFIG_QORIQ_CPUFREQ=y

CONFIG_HAVE_PREEMPT_LAZY=y
CONFIG_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_RT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y

2. Shell commands (Issue would happen within roughly 400 rounds of below loop)
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu8/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu9/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu10/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu11/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu12/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu13/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu14/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu15/cpufreq/scaling_governor

count=1
while [ $? -eq 0 ]
do
        echo "$count th test"
        sleep 3
        let "count=count+1"

        echo 0 > /sys/devices/system/cpu/cpu0/online
        echo 0 > /sys/devices/system/cpu/cpu1/online
        echo 0 > /sys/devices/system/cpu/cpu2/online
        echo 0 > /sys/devices/system/cpu/cpu3/online
        echo 0 > /sys/devices/system/cpu/cpu4/online
        echo 0 > /sys/devices/system/cpu/cpu5/online
        echo 0 > /sys/devices/system/cpu/cpu6/online
        echo 0 > /sys/devices/system/cpu/cpu7/online
        echo 0 > /sys/devices/system/cpu/cpu8/online
        echo 0 > /sys/devices/system/cpu/cpu9/online
        echo 0 > /sys/devices/system/cpu/cpu10/online
        echo 0 > /sys/devices/system/cpu/cpu11/online
        echo 0 > /sys/devices/system/cpu/cpu12/online
        echo 0 > /sys/devices/system/cpu/cpu13/online
        echo 0 > /sys/devices/system/cpu/cpu14/online

        echo 1 > /sys/devices/system/cpu/cpu0/online
        echo 1 > /sys/devices/system/cpu/cpu1/online
        echo 1 > /sys/devices/system/cpu/cpu2/online
        echo 1 > /sys/devices/system/cpu/cpu3/online
        echo 1 > /sys/devices/system/cpu/cpu4/online
        echo 1 > /sys/devices/system/cpu/cpu5/online
        echo 1 > /sys/devices/system/cpu/cpu6/online
        echo 1 > /sys/devices/system/cpu/cpu7/online
        echo 1 > /sys/devices/system/cpu/cpu8/online
        echo 1 > /sys/devices/system/cpu/cpu9/online
        echo 1 > /sys/devices/system/cpu/cpu10/online
        echo 1 > /sys/devices/system/cpu/cpu11/online
        echo 1 > /sys/devices/system/cpu/cpu12/online
        echo 1 > /sys/devices/system/cpu/cpu13/online
        echo 1 > /sys/devices/system/cpu/cpu14/online
done

To be honest, I am not sure how non-RT kernel could avoid this issue. Could anybody give some input/suggestion on this?
Thank you.

Regards,
Ran
 
> Suggested-by: Jiafei Pan <jiafei.pan@nxp.com>
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> ---
>  kernel/sched/core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 792da55..4cc742a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2054,10 +2054,15 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
>  			ktime_t to = NSEC_PER_SEC / HZ;
> 
>  			set_current_state(TASK_UNINTERRUPTIBLE);
> -			schedule_hrtimeout(&to, HRTIMER_MODE_REL);
> +
> +			if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
> +			    !strncmp(p->comm, "ksoftirqd/", 10))
> +				schedule_hrtimeout(&to,
> +					HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> +			else
> +				schedule_hrtimeout(&to, HRTIMER_MODE_REL);
>  			continue;
>  		}
> -
>  		/*
>  		 * Ahh, all good. It wasn't running, and it wasn't
>  		 * runnable, which means that it will never become
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
@ 2021-01-07  9:18 Ran Wang
  2021-01-07  9:13 ` Ran Wang
  2021-01-07 10:45 ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Ran Wang @ 2021-01-07  9:18 UTC (permalink / raw)
  To: Sebastian Siewior, Thomas Gleixner
  Cc: Jiafei Pan, linux-rt-users, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, linux-kernel, Ran Wang

When doing CPU un-plug stress test, function smpboot_park_threads() would
get call to park kernel threads (which including ksoftirqd) on that
CPU core, and function wait_task_inactive() would yield for those queued
task(s) by calling schedule_hrtimerout() with mode of HRTIMER_MODE_REL.

stack trace:
...
smpboot_thread_fn
    cpuhp_thread_fun
        cpuhp_invoke_callback
            smpboot_park_threads
              smpboot_park_thread: ksoftirqd/1
                kthread_park
                  wait_task_inactive
                     schedule_hrtimerout

However, when PREEMPT_RT is set, this would cause a pending issue since
schedule_hrtimerout() depend on thread ksoftirqd to complete related
work if it using HRTIMER_MODE_SOFT. So force using HRTIMER_MODE_HARD
in such case.

Suggested-by: Jiafei Pan <jiafei.pan@nxp.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
---
 kernel/sched/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 792da55..4cc742a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2054,10 +2054,15 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
 			ktime_t to = NSEC_PER_SEC / HZ;
 
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			schedule_hrtimeout(&to, HRTIMER_MODE_REL);
+
+			if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
+			    !strncmp(p->comm, "ksoftirqd/", 10))
+				schedule_hrtimeout(&to,
+					HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
+			else
+				schedule_hrtimeout(&to, HRTIMER_MODE_REL);
 			continue;
 		}
-
 		/*
 		 * Ahh, all good. It wasn't running, and it wasn't
 		 * runnable, which means that it will never become
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-07  9:18 [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged Ran Wang
  2021-01-07  9:13 ` Ran Wang
@ 2021-01-07 10:45 ` Peter Zijlstra
  2021-01-07 15:28   ` Sebastian Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-01-07 10:45 UTC (permalink / raw)
  To: Ran Wang
  Cc: Sebastian Siewior, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Thu, Jan 07, 2021 at 05:18:41PM +0800, Ran Wang wrote:
> +
> +			if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
> +			    !strncmp(p->comm, "ksoftirqd/", 10))
> +				schedule_hrtimeout(&to,
> +					HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> +			else
> +				schedule_hrtimeout(&to, HRTIMER_MODE_REL);

This is horrific, why did you not self-censor and spare me the mental
anguish of having to formulate a CoC compliant response?

It also violates coding style, but given the total lack of any sense,
that seems like a minor detail.

Why can't we use HRTIMER_MODE_HARD unconditionally?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-07 10:45 ` Peter Zijlstra
@ 2021-01-07 15:28   ` Sebastian Siewior
  2021-01-08  8:45     ` Ran Wang
  2021-01-08  9:32     ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Siewior @ 2021-01-07 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ran Wang, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On 2021-01-07 11:45:39 [+0100], Peter Zijlstra wrote:
> On Thu, Jan 07, 2021 at 05:18:41PM +0800, Ran Wang wrote:
> > +
> > +			if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
> > +			    !strncmp(p->comm, "ksoftirqd/", 10))
> > +				schedule_hrtimeout(&to,
> > +					HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> > +			else
> > +				schedule_hrtimeout(&to, HRTIMER_MODE_REL);
> 
> This is horrific, why did you not self-censor and spare me the mental
> anguish of having to formulate a CoC compliant response?
> 
> It also violates coding style, but given the total lack of any sense,
> that seems like a minor detail.
> 
> Why can't we use HRTIMER_MODE_HARD unconditionally?

I had a similar patch in -RT and dropped it in v5.10-rc7-rt16.
It was added because RT couldn't boot since creating the boot-threads
didn't work before the ksoftirqd was up. This was fixed by commit
   26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()")

and live was good again.
tglx (also) suggested to add HRTIMER_MODE_HARD unconditionally (it
looked at SYSTEM_STATE back then) and I was only worried some abuse via
userland.
This sleep can be triggered by ptrace/strace() and with brief testing I
can trigger the sleep there but I don't get it anywhere near where I
would notice it with cyclictest.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-07 15:28   ` Sebastian Siewior
@ 2021-01-08  8:45     ` Ran Wang
  2021-01-08  9:05       ` Sebastian Siewior
  2021-01-08  9:32     ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Ran Wang @ 2021-01-08  8:45 UTC (permalink / raw)
  To: Sebastian Siewior, Peter Zijlstra
  Cc: Thomas Gleixner, Jiafei Pan, linux-rt-users, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, linux-kernel

Hi Sebastian, Peter

Thursday, January 7, 2021 11:29 PM, Sebastian Siewior wrote:
> 
> On 2021-01-07 11:45:39 [+0100], Peter Zijlstra wrote:
> > On Thu, Jan 07, 2021 at 05:18:41PM +0800, Ran Wang wrote:
> > > +
> > > +			if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
> > > +			    !strncmp(p->comm, "ksoftirqd/", 10))
> > > +				schedule_hrtimeout(&to,
> > > +					HRTIMER_MODE_REL | HRTIMER_MODE_HARD);
> > > +			else
> > > +				schedule_hrtimeout(&to, HRTIMER_MODE_REL);
> >
> > This is horrific, why did you not self-censor and spare me the mental
> > anguish of having to formulate a CoC compliant response?
> >
> > It also violates coding style, but given the total lack of any sense,
> > that seems like a minor detail.
> >
> > Why can't we use HRTIMER_MODE_HARD unconditionally?
> 
> I had a similar patch in -RT and dropped it in v5.10-rc7-rt16.
> It was added because RT couldn't boot since creating the boot-threads didn't work before the ksoftirqd was up. This was fixed by commit
>    26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()")

I tried applying above commit to linux-5.6.y-rt, it could resolve my problem on LX2160ARDB, THANKS!

> and live was good again.
> tglx (also) suggested to add HRTIMER_MODE_HARD unconditionally (it looked at SYSTEM_STATE back then) and I was only worried some
> abuse via userland.

Got it.

Regards,
Ran

> This sleep can be triggered by ptrace/strace() and with brief testing I can trigger the sleep there but I don't get it anywhere near where I
> would notice it with cyclictest.
> 
> Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-08  8:45     ` Ran Wang
@ 2021-01-08  9:05       ` Sebastian Siewior
  2021-01-08  9:27         ` Ran Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Siewior @ 2021-01-08  9:05 UTC (permalink / raw)
  To: Ran Wang
  Cc: Peter Zijlstra, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On 2021-01-08 08:45:14 [+0000], Ran Wang wrote:
> Hi Sebastian, Peter
Hi,

> > I had a similar patch in -RT and dropped it in v5.10-rc7-rt16.
> > It was added because RT couldn't boot since creating the boot-threads didn't work before the ksoftirqd was up. This was fixed by commit
> >    26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()")
> 
> I tried applying above commit to linux-5.6.y-rt, it could resolve my problem on LX2160ARDB, THANKS!

so in other words all this could have been avoided by using a supported
or maintained RT series. The v5.4 series has this patch, v5.6 isn't
maintained anymore so it is likely that there is more missing.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-08  9:05       ` Sebastian Siewior
@ 2021-01-08  9:27         ` Ran Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Ran Wang @ 2021-01-08  9:27 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Peter Zijlstra, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

Hi Sebastian,

On Friday, January 8, 2021 5:05 PM, Sebastian Siewior wrote:
> 
> On 2021-01-08 08:45:14 [+0000], Ran Wang wrote:
> > Hi Sebastian, Peter
> Hi,
> 
> > > I had a similar patch in -RT and dropped it in v5.10-rc7-rt16.
> > > It was added because RT couldn't boot since creating the boot-threads didn't work before the ksoftirqd was up. This was fixed by commit
> > >    26c7295be0c5e ("kthread: Do not preempt current task if it is
> > > going to call schedule()")
> >
> > I tried applying above commit to linux-5.6.y-rt, it could resolve my problem on LX2160ARDB, THANKS!
> 
> so in other words all this could have been avoided by using a supported or maintained RT series. The v5.4 series has this patch, v5.6 isn't
> maintained anymore so it is likely that there is more missing.

Thanks for let me know this.

The reason I trying linux-5.6-rt is that I have encountered other more serious issues on later RT version (even with v5.10-rc7-rt16), one of them
is CPU hot plug got stuck in irq_work_sync() which called by sugov_stop(), failure happen at 1st loop stress test every time.
I will try to collect more information and create another mail thread later.

Thanks & Regards,
Ran

> Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-07 15:28   ` Sebastian Siewior
  2021-01-08  8:45     ` Ran Wang
@ 2021-01-08  9:32     ` Peter Zijlstra
  2021-01-08 11:19       ` Sebastian Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-01-08  9:32 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Ran Wang, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On Thu, Jan 07, 2021 at 04:28:43PM +0100, Sebastian Siewior wrote:
> This sleep can be triggered by ptrace/strace() and with brief testing I
> can trigger the sleep there but I don't get it anywhere near where I
> would notice it with cyclictest.

It's a single task wakeup (the caller), doing that from hardirq context
really should not be a problem, we do lots of that in RT already.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
  2021-01-08  9:32     ` Peter Zijlstra
@ 2021-01-08 11:19       ` Sebastian Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Siewior @ 2021-01-08 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ran Wang, Thomas Gleixner, Jiafei Pan, linux-rt-users,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, linux-kernel

On 2021-01-08 10:32:36 [+0100], Peter Zijlstra wrote:
> It's a single task wakeup (the caller), doing that from hardirq context
> really should not be a problem, we do lots of that in RT already.

I'm not worry about that single wakeup but about an artificial case
where you manage to accumulate multiple single wake ups in a short
time frame.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-08 11:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07  9:18 [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged Ran Wang
2021-01-07  9:13 ` Ran Wang
2021-01-07 10:45 ` Peter Zijlstra
2021-01-07 15:28   ` Sebastian Siewior
2021-01-08  8:45     ` Ran Wang
2021-01-08  9:05       ` Sebastian Siewior
2021-01-08  9:27         ` Ran Wang
2021-01-08  9:32     ` Peter Zijlstra
2021-01-08 11:19       ` Sebastian Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.