[PATCH 1/1] sched/core: Fix stuck on completion for affine_move

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-27  3:34 ` Kuyo Chang
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang @ 2023-09-27  3:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Matthias Brugger,
	AngeloGioacchino Del Regno
  Cc: wsd_upstream, kuyo chang, linux-kernel, linux-arm-kernel, linux-mediatek

From: kuyo chang <kuyo.chang@mediatek.com>

[Syndrome] hung detect shows below warning msg
[ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds.
[ 4320.666589] [   T56] khungtaskd: [name:core&]task:stressapptest   state:D stack:0     pid:17803 ppid:17579  flags:0x04000008
[ 4320.666601] [   T56] khungtaskd: Call trace:
[ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
[ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
[ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
[ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
[ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
[ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
[ 4320.666681] [   T56] khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
[ 4320.666702] [   T56] khungtaskd:  __sched_setaffinity+0x118/0x160
[ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
[ 4320.666715] [   T56] khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
[ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
[ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
[ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
[ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
[ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
[ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190

[Analysis]

After add some debug footprint massage, this issue happened at stopper
disable case.
It cannot exec migration_cpu_stop fun to complete migration.
This will cause stuck on wait_for_completion.

Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
---
 kernel/sched/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1dc0b0287e30..98c217a1caa0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		task_rq_unlock(rq, p, rf);
 
 		if (!stop_pending) {
-			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-					    &pending->arg, &pending->stop_work);
+			if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work))
+				return -ENOENT;
 		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-27  3:34 ` Kuyo Chang
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang @ 2023-09-27  3:34 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Matthias Brugger,
	AngeloGioacchino Del Regno
  Cc: wsd_upstream, kuyo chang, linux-kernel, linux-arm-kernel, linux-mediatek

From: kuyo chang <kuyo.chang@mediatek.com>

[Syndrome] hung detect shows below warning msg
[ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds.
[ 4320.666589] [   T56] khungtaskd: [name:core&]task:stressapptest   state:D stack:0     pid:17803 ppid:17579  flags:0x04000008
[ 4320.666601] [   T56] khungtaskd: Call trace:
[ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
[ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
[ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
[ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
[ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
[ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
[ 4320.666681] [   T56] khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
[ 4320.666702] [   T56] khungtaskd:  __sched_setaffinity+0x118/0x160
[ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
[ 4320.666715] [   T56] khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
[ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
[ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
[ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
[ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
[ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
[ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190

[Analysis]

After add some debug footprint massage, this issue happened at stopper
disable case.
It cannot exec migration_cpu_stop fun to complete migration.
This will cause stuck on wait_for_completion.

Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
---
 kernel/sched/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1dc0b0287e30..98c217a1caa0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		task_rq_unlock(rq, p, rf);
 
 		if (!stop_pending) {
-			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-					    &pending->arg, &pending->stop_work);
+			if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work))
+				return -ENOENT;
 		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
-- 
2.18.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-27  3:34 ` Kuyo Chang
@ 2023-09-27  8:08   ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-27  8:08 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Matthias Brugger,
	AngeloGioacchino Del Regno, wsd_upstream, linux-kernel,
	linux-arm-kernel, linux-mediatek

On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
> 
> [Syndrome] hung detect shows below warning msg
> [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds.
> [ 4320.666589] [   T56] khungtaskd: [name:core&]task:stressapptest   state:D stack:0     pid:17803 ppid:17579  flags:0x04000008
> [ 4320.666601] [   T56] khungtaskd: Call trace:
> [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> [ 4320.666681] [   T56] khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> [ 4320.666702] [   T56] khungtaskd:  __sched_setaffinity+0x118/0x160
> [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> [ 4320.666715] [   T56] khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> 
> [Analysis]
> 
> After add some debug footprint massage, this issue happened at stopper
> disable case.
> It cannot exec migration_cpu_stop fun to complete migration.
> This will cause stuck on wait_for_completion.

How did you get in this situation?

> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> ---
>  kernel/sched/core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1dc0b0287e30..98c217a1caa0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
>  		task_rq_unlock(rq, p, rf);
>  
>  		if (!stop_pending) {
> -			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> -					    &pending->arg, &pending->stop_work);
> +			if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> +					    &pending->arg, &pending->stop_work))
> +				return -ENOENT;

And -ENOENT is the right return code for when the target CPU is not
available?

I suspect you're missing more than halp the picture and this is a
band-aid solution at best. Please try harder.

>  		}
>  
>  		if (flags & SCA_MIGRATE_ENABLE)
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-27  8:08   ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-27  8:08 UTC (permalink / raw)
  To: Kuyo Chang
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Matthias Brugger,
	AngeloGioacchino Del Regno, wsd_upstream, linux-kernel,
	linux-arm-kernel, linux-mediatek

On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> From: kuyo chang <kuyo.chang@mediatek.com>
> 
> [Syndrome] hung detect shows below warning msg
> [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds.
> [ 4320.666589] [   T56] khungtaskd: [name:core&]task:stressapptest   state:D stack:0     pid:17803 ppid:17579  flags:0x04000008
> [ 4320.666601] [   T56] khungtaskd: Call trace:
> [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> [ 4320.666681] [   T56] khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> [ 4320.666702] [   T56] khungtaskd:  __sched_setaffinity+0x118/0x160
> [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> [ 4320.666715] [   T56] khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> 
> [Analysis]
> 
> After add some debug footprint massage, this issue happened at stopper
> disable case.
> It cannot exec migration_cpu_stop fun to complete migration.
> This will cause stuck on wait_for_completion.

How did you get in this situation?

> Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> ---
>  kernel/sched/core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1dc0b0287e30..98c217a1caa0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
>  		task_rq_unlock(rq, p, rf);
>  
>  		if (!stop_pending) {
> -			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> -					    &pending->arg, &pending->stop_work);
> +			if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> +					    &pending->arg, &pending->stop_work))
> +				return -ENOENT;

And -ENOENT is the right return code for when the target CPU is not
available?

I suspect you're missing more than halp the picture and this is a
band-aid solution at best. Please try harder.

>  		}
>  
>  		if (flags & SCA_MIGRATE_ENABLE)
> -- 
> 2.18.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-27  8:08   ` Peter Zijlstra
@ 2023-09-27 15:57     ` Kuyo Chang (張建文)
  -1 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-09-27 15:57 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, 2023-09-27 at 10:08 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> > From: kuyo chang <kuyo.chang@mediatek.com>
> > 
> > [Syndrome] hung detect shows below warning msg
> > [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task
> stressapptest:17803 blocked for more than 3600 seconds.
> > [ 4320.666589] [   T56] khungtaskd:
> [name:core&]task:stressapptest   state:D stack:0     pid:17803
> ppid:17579  flags:0x04000008
> > [ 4320.666601] [   T56] khungtaskd: Call trace:
> > [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> > [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> > [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> > [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> > [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> > [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> > [ 4320.666681] [   T56]
> khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> > [ 4320.666702] [   T56]
> khungtaskd:  __sched_setaffinity+0x118/0x160
> > [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> > [ 4320.666715] [   T56]
> khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> > [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> > [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> > [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> > [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> > [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> > [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> > 
> > [Analysis]
> > 
> > After add some debug footprint massage, this issue happened at
> stopper
> > disable case.
> > It cannot exec migration_cpu_stop fun to complete migration.
> > This will cause stuck on wait_for_completion.
> 
> How did you get in this situation?
> 

This issue occurs at CPU hotplug/set_affinity stress test.
The reproduce ratio is very low(about once a week).

So I add/record some debug message to snapshot the task status while it
stuck on wait_for_completion.

Below is the snapshot status while issue happened:

cpu_active_mask is 0xFC
new_mask is 0x8
pending->arg.dest_cpu is 0x3
task_on_cpu(rq,p) is 1
task_cpu is 0x2
p__state = TASK_RUNNING
flag is SCA_CHACK|SCA_USER
stop_one_cpu_nowait(stopper->enabled) return value is false.

I also record the footprint at migration_cpu_stop.
It shows the migration_cpu_stop is not execute.


> > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> > ---
> >  kernel/sched/core.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 1dc0b0287e30..98c217a1caa0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
> >  task_rq_unlock(rq, p, rf);
> >  
> >  if (!stop_pending) {
> > -stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > -    &pending->arg, &pending->stop_work);
> > +if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > +    &pending->arg, &pending->stop_work))
> > +return -ENOENT;
> 
> And -ENOENT is the right return code for when the target CPU is not
> available?
> 
> I suspect you're missing more than halp the picture and this is a
> band-aid solution at best. Please try harder.
> 

I think -ENOENT means stopper is not execute? 
Perhaps the error code is abused, or could you kindly give me some
suggestions?

Thanks,
Kuyo

> >  }
> >  
> >  if (flags & SCA_MIGRATE_ENABLE)
> > -- 
> > 2.18.0
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-27 15:57     ` Kuyo Chang (張建文)
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-09-27 15:57 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, 2023-09-27 at 10:08 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> > From: kuyo chang <kuyo.chang@mediatek.com>
> > 
> > [Syndrome] hung detect shows below warning msg
> > [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task
> stressapptest:17803 blocked for more than 3600 seconds.
> > [ 4320.666589] [   T56] khungtaskd:
> [name:core&]task:stressapptest   state:D stack:0     pid:17803
> ppid:17579  flags:0x04000008
> > [ 4320.666601] [   T56] khungtaskd: Call trace:
> > [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> > [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> > [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> > [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> > [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> > [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> > [ 4320.666681] [   T56]
> khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> > [ 4320.666702] [   T56]
> khungtaskd:  __sched_setaffinity+0x118/0x160
> > [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> > [ 4320.666715] [   T56]
> khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> > [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> > [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> > [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> > [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> > [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> > [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> > 
> > [Analysis]
> > 
> > After add some debug footprint massage, this issue happened at
> stopper
> > disable case.
> > It cannot exec migration_cpu_stop fun to complete migration.
> > This will cause stuck on wait_for_completion.
> 
> How did you get in this situation?
> 

This issue occurs at CPU hotplug/set_affinity stress test.
The reproduce ratio is very low(about once a week).

So I add/record some debug message to snapshot the task status while it
stuck on wait_for_completion.

Below is the snapshot status while issue happened:

cpu_active_mask is 0xFC
new_mask is 0x8
pending->arg.dest_cpu is 0x3
task_on_cpu(rq,p) is 1
task_cpu is 0x2
p__state = TASK_RUNNING
flag is SCA_CHACK|SCA_USER
stop_one_cpu_nowait(stopper->enabled) return value is false.

I also record the footprint at migration_cpu_stop.
It shows the migration_cpu_stop is not execute.


> > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com>
> > ---
> >  kernel/sched/core.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 1dc0b0287e30..98c217a1caa0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
> >  task_rq_unlock(rq, p, rf);
> >  
> >  if (!stop_pending) {
> > -stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > -    &pending->arg, &pending->stop_work);
> > +if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > +    &pending->arg, &pending->stop_work))
> > +return -ENOENT;
> 
> And -ENOENT is the right return code for when the target CPU is not
> available?
> 
> I suspect you're missing more than halp the picture and this is a
> band-aid solution at best. Please try harder.
> 

I think -ENOENT means stopper is not execute? 
Perhaps the error code is abused, or could you kindly give me some
suggestions?

Thanks,
Kuyo

> >  }
> >  
> >  if (flags & SCA_MIGRATE_ENABLE)
> > -- 
> > 2.18.0
> > 
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-27 15:57     ` Kuyo Chang (張建文)
@ 2023-09-28 15:16       ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-28 15:16 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> On Wed, 2023-09-27 at 10:08 +0200, Peter Zijlstra wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> > > From: kuyo chang <kuyo.chang@mediatek.com>
> > > 
> > > [Syndrome] hung detect shows below warning msg
> > > [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task
> > stressapptest:17803 blocked for more than 3600 seconds.
> > > [ 4320.666589] [   T56] khungtaskd:
> > [name:core&]task:stressapptest   state:D stack:0     pid:17803
> > ppid:17579  flags:0x04000008
> > > [ 4320.666601] [   T56] khungtaskd: Call trace:
> > > [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> > > [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> > > [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> > > [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> > > [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> > > [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> > > [ 4320.666681] [   T56]
> > khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> > > [ 4320.666702] [   T56]
> > khungtaskd:  __sched_setaffinity+0x118/0x160
> > > [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> > > [ 4320.666715] [   T56]
> > khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> > > [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> > > [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> > > [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> > > [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> > > [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> > > [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> > > 
> > > [Analysis]
> > > 
> > > After add some debug footprint massage, this issue happened at
> > stopper
> > > disable case.
> > > It cannot exec migration_cpu_stop fun to complete migration.
> > > This will cause stuck on wait_for_completion.
> > 
> > How did you get in this situation?
> > 
> 
> This issue occurs at CPU hotplug/set_affinity stress test.
> The reproduce ratio is very low(about once a week).
> 
> So I add/record some debug message to snapshot the task status while it
> stuck on wait_for_completion.
> 
> Below is the snapshot status while issue happened:
> 
> cpu_active_mask is 0xFC
> new_mask is 0x8
> pending->arg.dest_cpu is 0x3
> task_on_cpu(rq,p) is 1
> task_cpu is 0x2
> p__state = TASK_RUNNING
> flag is SCA_CHACK|SCA_USER
> stop_one_cpu_nowait(stopper->enabled) return value is false.
> 
> I also record the footprint at migration_cpu_stop.
> It shows the migration_cpu_stop is not execute.

AFAICT this is migrate_enable(), which acts on current, so how can the
CPU that current runs on go away?

That is completely unexplained. You've not given a proper description of
the race scenario. And because you've not, we can't even begin to talk
about how best to address the issue.

> > struct task_struct *p, struct rq_flag
> > >  task_rq_unlock(rq, p, rf);
> > >  
> > >  if (!stop_pending) {
> > > -stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > > -    &pending->arg, &pending->stop_work);
> > > +if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > > +    &pending->arg, &pending->stop_work))
> > > +return -ENOENT;
> > 
> > And -ENOENT is the right return code for when the target CPU is not
> > available?
> > 
> > I suspect you're missing more than halp the picture and this is a
> > band-aid solution at best. Please try harder.
> > 
> 
> I think -ENOENT means stopper is not execute? 
> Perhaps the error code is abused, or could you kindly give me some
> suggestions?

Well, at this point you're leaving the whole affine_move_task()
machinery in an undefined state, which is a much bigger problem than the
weird return value.

Please read through that function and its comments a number of times. If
you're not a little nervous, you've not understood the thing.

Your patch has at least one very obvious resource leak.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-28 15:16       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-28 15:16 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> On Wed, 2023-09-27 at 10:08 +0200, Peter Zijlstra wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote:
> > > From: kuyo chang <kuyo.chang@mediatek.com>
> > > 
> > > [Syndrome] hung detect shows below warning msg
> > > [ 4320.666557] [   T56] khungtaskd: [name:hung_task&]INFO: task
> > stressapptest:17803 blocked for more than 3600 seconds.
> > > [ 4320.666589] [   T56] khungtaskd:
> > [name:core&]task:stressapptest   state:D stack:0     pid:17803
> > ppid:17579  flags:0x04000008
> > > [ 4320.666601] [   T56] khungtaskd: Call trace:
> > > [ 4320.666607] [   T56] khungtaskd:  __switch_to+0x17c/0x338
> > > [ 4320.666642] [   T56] khungtaskd:  __schedule+0x54c/0x8ec
> > > [ 4320.666651] [   T56] khungtaskd:  schedule+0x74/0xd4
> > > [ 4320.666656] [   T56] khungtaskd:  schedule_timeout+0x34/0x108
> > > [ 4320.666672] [   T56] khungtaskd:  do_wait_for_common+0xe0/0x154
> > > [ 4320.666678] [   T56] khungtaskd:  wait_for_completion+0x44/0x58
> > > [ 4320.666681] [   T56]
> > khungtaskd:  __set_cpus_allowed_ptr_locked+0x344/0x730
> > > [ 4320.666702] [   T56]
> > khungtaskd:  __sched_setaffinity+0x118/0x160
> > > [ 4320.666709] [   T56] khungtaskd:  sched_setaffinity+0x10c/0x248
> > > [ 4320.666715] [   T56]
> > khungtaskd:  __arm64_sys_sched_setaffinity+0x15c/0x1c0
> > > [ 4320.666719] [   T56] khungtaskd:  invoke_syscall+0x3c/0xf8
> > > [ 4320.666743] [   T56] khungtaskd:  el0_svc_common+0xb0/0xe8
> > > [ 4320.666749] [   T56] khungtaskd:  do_el0_svc+0x28/0xa8
> > > [ 4320.666755] [   T56] khungtaskd:  el0_svc+0x28/0x9c
> > > [ 4320.666761] [   T56] khungtaskd:  el0t_64_sync_handler+0x7c/0xe4
> > > [ 4320.666766] [   T56] khungtaskd:  el0t_64_sync+0x18c/0x190
> > > 
> > > [Analysis]
> > > 
> > > After add some debug footprint massage, this issue happened at
> > stopper
> > > disable case.
> > > It cannot exec migration_cpu_stop fun to complete migration.
> > > This will cause stuck on wait_for_completion.
> > 
> > How did you get in this situation?
> > 
> 
> This issue occurs at CPU hotplug/set_affinity stress test.
> The reproduce ratio is very low(about once a week).
> 
> So I add/record some debug message to snapshot the task status while it
> stuck on wait_for_completion.
> 
> Below is the snapshot status while issue happened:
> 
> cpu_active_mask is 0xFC
> new_mask is 0x8
> pending->arg.dest_cpu is 0x3
> task_on_cpu(rq,p) is 1
> task_cpu is 0x2
> p__state = TASK_RUNNING
> flag is SCA_CHACK|SCA_USER
> stop_one_cpu_nowait(stopper->enabled) return value is false.
> 
> I also record the footprint at migration_cpu_stop.
> It shows the migration_cpu_stop is not execute.

AFAICT this is migrate_enable(), which acts on current, so how can the
CPU that current runs on go away?

That is completely unexplained. You've not given a proper description of
the race scenario. And because you've not, we can't even begin to talk
about how best to address the issue.

> > struct task_struct *p, struct rq_flag
> > >  task_rq_unlock(rq, p, rf);
> > >  
> > >  if (!stop_pending) {
> > > -stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > > -    &pending->arg, &pending->stop_work);
> > > +if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
> > > +    &pending->arg, &pending->stop_work))
> > > +return -ENOENT;
> > 
> > And -ENOENT is the right return code for when the target CPU is not
> > available?
> > 
> > I suspect you're missing more than halp the picture and this is a
> > band-aid solution at best. Please try harder.
> > 
> 
> I think -ENOENT means stopper is not execute? 
> Perhaps the error code is abused, or could you kindly give me some
> suggestions?

Well, at this point you're leaving the whole affine_move_task()
machinery in an undefined state, which is a much bigger problem than the
weird return value.

Please read through that function and its comments a number of times. If
you're not a little nervous, you've not understood the thing.

Your patch has at least one very obvious resource leak.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-28 15:16       ` Peter Zijlstra
@ 2023-09-28 15:19         ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-28 15:19 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Thu, Sep 28, 2023 at 05:16:16PM +0200, Peter Zijlstra wrote:

> AFAICT this is migrate_enable(), which acts on current, so how can the
> CPU that current runs on go away?

> Your patch has at least one very obvious resource leak.

Sorry those are not so, I ended up staring at the wrong
stop_one_cpu_nowait() :-/

Still, the rest is very much the case, if you can't describe the exact
race scenario, you can't be talking about a solution.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-28 15:19         ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-28 15:19 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Thu, Sep 28, 2023 at 05:16:16PM +0200, Peter Zijlstra wrote:

> AFAICT this is migrate_enable(), which acts on current, so how can the
> CPU that current runs on go away?

> Your patch has at least one very obvious resource leak.

Sorry those are not so, I ended up staring at the wrong
stop_one_cpu_nowait() :-/

Still, the rest is very much the case, if you can't describe the exact
race scenario, you can't be talking about a solution.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-27 15:57     ` Kuyo Chang (張建文)
@ 2023-09-29 10:21       ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-29 10:21 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:

> This issue occurs at CPU hotplug/set_affinity stress test.
> The reproduce ratio is very low(about once a week).

I'm assuming you're running an arm64 kernel with preempt_full=y (the
default for arm64).

Could you please test the below?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8fd29d66b24..079a63b8a954 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
 		 * it.
 		 */
 		WARN_ON_ONCE(!pending->stop_pending);
+		preempt_disable();
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
+		preempt_enable();
 		return 0;
 	}
 out:
@@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			complete = true;
 		}
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (push_task) {
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    p, &rq->push_work);
 		}
+		preempt_enable();
 
 		if (complete)
 			complete_all(&pending->done);
@@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (!stop_pending) {
 			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 					    &pending->arg, &pending->stop_work);
 		}
+		preempt_enable();
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;
@@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
 	 * Temporarily drop rq->lock such that we can wake-up the stop task.
 	 * Both preemption and IRQs are still disabled.
 	 */
+	preempt_disable();
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
 			    this_cpu_ptr(&push_work));
@@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
 	 * which kthread_is_per_cpu() and will push this task away.
 	 */
 	raw_spin_rq_lock(rq);
+	preempt_enable();
 }
 
 static void balance_push_set(int cpu, bool on)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-09-29 10:21       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-09-29 10:21 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:

> This issue occurs at CPU hotplug/set_affinity stress test.
> The reproduce ratio is very low(about once a week).

I'm assuming you're running an arm64 kernel with preempt_full=y (the
default for arm64).

Could you please test the below?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8fd29d66b24..079a63b8a954 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
 		 * it.
 		 */
 		WARN_ON_ONCE(!pending->stop_pending);
+		preempt_disable();
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
+		preempt_enable();
 		return 0;
 	}
 out:
@@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			complete = true;
 		}
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (push_task) {
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    p, &rq->push_work);
 		}
+		preempt_enable();
 
 		if (complete)
 			complete_all(&pending->done);
@@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (!stop_pending) {
 			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 					    &pending->arg, &pending->stop_work);
 		}
+		preempt_enable();
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;
@@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
 	 * Temporarily drop rq->lock such that we can wake-up the stop task.
 	 * Both preemption and IRQs are still disabled.
 	 */
+	preempt_disable();
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
 			    this_cpu_ptr(&push_work));
@@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
 	 * which kthread_is_per_cpu() and will push this task away.
 	 */
 	raw_spin_rq_lock(rq);
+	preempt_enable();
 }
 
 static void balance_push_set(int cpu, bool on)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-29 10:21       ` Peter Zijlstra
@ 2023-10-01 15:15         ` Kuyo Chang (張建文)
  -1 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-01 15:15 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> 
> > This issue occurs at CPU hotplug/set_affinity stress test.
> > The reproduce ratio is very low(about once a week).
> 
> I'm assuming you're running an arm64 kernel with preempt_full=y (the
> default for arm64).

Yes, the test platform is arm64 with kernel config as below

CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_KEY=y
CONFIG_PREEMPT_NOTIFIERS=y

> Could you please test the below?

Ok, let me run it and report.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d8fd29d66b24..079a63b8a954 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> @@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
>   * which kthread_is_per_cpu() and will push this task away.
>   */
>  raw_spin_rq_lock(rq);
> +preempt_enable();
>  }
>  
>  static void balance_push_set(int cpu, bool on)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-10-01 15:15         ` Kuyo Chang (張建文)
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-01 15:15 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> 
> > This issue occurs at CPU hotplug/set_affinity stress test.
> > The reproduce ratio is very low(about once a week).
> 
> I'm assuming you're running an arm64 kernel with preempt_full=y (the
> default for arm64).

Yes, the test platform is arm64 with kernel config as below

CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_KEY=y
CONFIG_PREEMPT_NOTIFIERS=y

> Could you please test the below?

Ok, let me run it and report.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d8fd29d66b24..079a63b8a954 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> @@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
>   * which kthread_is_per_cpu() and will push this task away.
>   */
>  raw_spin_rq_lock(rq);
> +preempt_enable();
>  }
>  
>  static void balance_push_set(int cpu, bool on)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-09-29 10:21       ` Peter Zijlstra
@ 2023-10-10 14:40         ` Kuyo Chang (張建文)
  -1 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-10 14:40 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> 
> > This issue occurs at CPU hotplug/set_affinity stress test.
> > The reproduce ratio is very low(about once a week).
> 
> I'm assuming you're running an arm64 kernel with preempt_full=y (the
> default for arm64).
> 
> Could you please test the below?
> 

It is running good so far(more than a week)on hotplug/set affinity
stress test. I will keep it testing and report back if it happens
again.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d8fd29d66b24..079a63b8a954 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> @@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
>   * which kthread_is_per_cpu() and will push this task away.
>   */
>  raw_spin_rq_lock(rq);
> +preempt_enable();
>  }
>  
>  static void balance_push_set(int cpu, bool on)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-10-10 14:40         ` Kuyo Chang (張建文)
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-10 14:40 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> 
> > This issue occurs at CPU hotplug/set_affinity stress test.
> > The reproduce ratio is very low(about once a week).
> 
> I'm assuming you're running an arm64 kernel with preempt_full=y (the
> default for arm64).
> 
> Could you please test the below?
> 

It is running good so far(more than a week)on hotplug/set affinity
stress test. I will keep it testing and report back if it happens
again.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d8fd29d66b24..079a63b8a954 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq,
> struct task_struct *p, struct rq_flag
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,6 +9461,7 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> @@ -9468,6 +9471,7 @@ static void balance_push(struct rq *rq)
>   * which kthread_is_per_cpu() and will push this task away.
>   */
>  raw_spin_rq_lock(rq);
> +preempt_enable();
>  }
>  
>  static void balance_push_set(int cpu, bool on)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
  2023-10-10 14:40         ` Kuyo Chang (張建文)
@ 2023-10-10 14:57           ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-10 14:57 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
> >  	 
> >  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> > 
> > > This issue occurs at CPU hotplug/set_affinity stress test.
> > > The reproduce ratio is very low(about once a week).
> > 
> > I'm assuming you're running an arm64 kernel with preempt_full=y (the
> > default for arm64).
> > 
> > Could you please test the below?
> > 
> 
> It is running good so far(more than a week)on hotplug/set affinity
> stress test. I will keep it testing and report back if it happens
> again.

OK, I suppose I should look at writing a coherent Changelog for this
then...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable
@ 2023-10-10 14:57           ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-10 14:57 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> On Fri, 2023-09-29 at 12:21 +0200, Peter Zijlstra wrote:
> >  	 
> >  On Wed, Sep 27, 2023 at 03:57:35PM +0000, Kuyo Chang (張建文) wrote:
> > 
> > > This issue occurs at CPU hotplug/set_affinity stress test.
> > > The reproduce ratio is very low(about once a week).
> > 
> > I'm assuming you're running an arm64 kernel with preempt_full=y (the
> > default for arm64).
> > 
> > Could you please test the below?
> > 
> 
> It is running good so far(more than a week)on hotplug/set affinity
> stress test. I will keep it testing and report back if it happens
> again.

OK, I suppose I should look at writing a coherent Changelog for this
then...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
  2023-10-10 14:57           ` Peter Zijlstra
@ 2023-10-10 20:04             ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-10 20:04 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:

> > It is running good so far(more than a week)on hotplug/set affinity
> > stress test. I will keep it testing and report back if it happens
> > again.
> 
> OK, I suppose I should look at writing a coherent Changelog for this
> then...

Something like the below... ?

---
Subject: sched: Fix stop_one_cpu_nowait() vs hotplug
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Oct 10 20:57:39 CEST 2023

Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
hotplug stress-test -- notably affine_move_task() remains stuck in
wait_for_completion(), leading to a hung-task detector warning.

Specifically, it was reported that stop_one_cpu_nowait(.fn =
migration_cpu_stop) returns false -- this stopper is responsible for
the matching complete().

The race scenario is:

	CPU0					CPU1

					// doing _cpu_down()

  __set_cpus_allowed_ptr()
    task_rq_lock();
					takedown_cpu()
					  stop_machine_cpuslocked(take_cpu_down..)
	
					<PREEMPT: cpu_stopper_thread()
					  MULTI_STOP_PREPARE
					  ...
    __set_cpus_allowed_ptr_locked()
      affine_move_task()
        task_rq_unlock();

  <PREEMPT: cpu_stopper_thread()\> 
    ack_state()
					  MULTI_STOP_RUN
					    take_cpu_down()
					      __cpu_disable();
					      stop_machine_park();
						stopper->enabled = false;
					 />
   />
	stop_one_cpu_nowait(.fn = migration_cpu_stop);
          if (stopper->enabled) // false!!!

	
That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
stopper thread gets a chance to preempt and allows the cpu-down for
the target CPU to complete.

OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
issue a wakeup, it must not be ran under the scheduler locks.

Solve this apparent contradiction by keeping preemption disabled over
the unlock + queue_stopper combination:

	preempt_disable();
	task_rq_unlock(...);
	if (!stop_pending)
	  stop_one_cpu_nowait(...)
	preempt_enable();

This respects the lock ordering contraints while still avoiding the
above race. That is, if we find the CPU is online under rq-lock, the
targeted stop_one_cpu_nowait() must succeed.

Apply this pattern to all similar stop_one_cpu_nowait() invocations.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
---
 kernel/sched/core.c     |   10 ++++++++--
 kernel/sched/deadline.c |    2 ++
 kernel/sched/fair.c     |    4 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data
 		 * it.
 		 */
 		WARN_ON_ONCE(!pending->stop_pending);
+		preempt_disable();
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
+		preempt_enable();
 		return 0;
 	}
 out:
@@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *r
 			complete = true;
 		}
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (push_task) {
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    p, &rq->push_work);
 		}
+		preempt_enable();
 
 		if (complete)
 			complete_all(&pending->done);
@@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *r
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (!stop_pending) {
 			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 					    &pending->arg, &pending->stop_work);
 		}
+		preempt_enable();
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;
@@ -9459,9 +9463,11 @@ static void balance_push(struct rq *rq)
 	 * Temporarily drop rq->lock such that we can wake-up the stop task.
 	 * Both preemption and IRQs are still disabled.
 	 */
+	preempt_disable();
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
 			    this_cpu_ptr(&push_work));
+	preempt_enable();
 	/*
 	 * At this point need_resched() is true and we'll take the loop in
 	 * schedule(). The next pick is obviously going to be the stop task
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2420,9 +2420,11 @@ static void pull_dl_task(struct rq *this
 		double_unlock_balance(this_rq, src_rq);
 
 		if (push_task) {
+			preempt_disable();
 			raw_spin_rq_unlock(this_rq);
 			stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
 					    push_task, &src_rq->push_work);
+			preempt_enable();
 			raw_spin_rq_lock(this_rq);
 		}
 	}
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11299,13 +11299,15 @@ static int load_balance(int this_cpu, st
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
 			}
-			raw_spin_rq_unlock_irqrestore(busiest, flags);
 
+			preempt_disable();
+			raw_spin_rq_unlock_irqrestore(busiest, flags);
 			if (active_balance) {
 				stop_one_cpu_nowait(cpu_of(busiest),
 					active_load_balance_cpu_stop, busiest,
 					&busiest->active_balance_work);
 			}
+			preempt_enable();
 		}
 	} else {
 		sd->nr_balance_failed = 0;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
@ 2023-10-10 20:04             ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-10 20:04 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:

> > It is running good so far(more than a week)on hotplug/set affinity
> > stress test. I will keep it testing and report back if it happens
> > again.
> 
> OK, I suppose I should look at writing a coherent Changelog for this
> then...

Something like the below... ?

---
Subject: sched: Fix stop_one_cpu_nowait() vs hotplug
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Oct 10 20:57:39 CEST 2023

Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
hotplug stress-test -- notably affine_move_task() remains stuck in
wait_for_completion(), leading to a hung-task detector warning.

Specifically, it was reported that stop_one_cpu_nowait(.fn =
migration_cpu_stop) returns false -- this stopper is responsible for
the matching complete().

The race scenario is:

	CPU0					CPU1

					// doing _cpu_down()

  __set_cpus_allowed_ptr()
    task_rq_lock();
					takedown_cpu()
					  stop_machine_cpuslocked(take_cpu_down..)
	
					<PREEMPT: cpu_stopper_thread()
					  MULTI_STOP_PREPARE
					  ...
    __set_cpus_allowed_ptr_locked()
      affine_move_task()
        task_rq_unlock();

  <PREEMPT: cpu_stopper_thread()\> 
    ack_state()
					  MULTI_STOP_RUN
					    take_cpu_down()
					      __cpu_disable();
					      stop_machine_park();
						stopper->enabled = false;
					 />
   />
	stop_one_cpu_nowait(.fn = migration_cpu_stop);
          if (stopper->enabled) // false!!!

	
That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
stopper thread gets a chance to preempt and allows the cpu-down for
the target CPU to complete.

OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
issue a wakeup, it must not be ran under the scheduler locks.

Solve this apparent contradiction by keeping preemption disabled over
the unlock + queue_stopper combination:

	preempt_disable();
	task_rq_unlock(...);
	if (!stop_pending)
	  stop_one_cpu_nowait(...)
	preempt_enable();

This respects the lock ordering contraints while still avoiding the
above race. That is, if we find the CPU is online under rq-lock, the
targeted stop_one_cpu_nowait() must succeed.

Apply this pattern to all similar stop_one_cpu_nowait() invocations.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
---
 kernel/sched/core.c     |   10 ++++++++--
 kernel/sched/deadline.c |    2 ++
 kernel/sched/fair.c     |    4 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data
 		 * it.
 		 */
 		WARN_ON_ONCE(!pending->stop_pending);
+		preempt_disable();
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
+		preempt_enable();
 		return 0;
 	}
 out:
@@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *r
 			complete = true;
 		}
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (push_task) {
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    p, &rq->push_work);
 		}
+		preempt_enable();
 
 		if (complete)
 			complete_all(&pending->done);
@@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *r
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (!stop_pending) {
 			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 					    &pending->arg, &pending->stop_work);
 		}
+		preempt_enable();
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;
@@ -9459,9 +9463,11 @@ static void balance_push(struct rq *rq)
 	 * Temporarily drop rq->lock such that we can wake-up the stop task.
 	 * Both preemption and IRQs are still disabled.
 	 */
+	preempt_disable();
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
 			    this_cpu_ptr(&push_work));
+	preempt_enable();
 	/*
 	 * At this point need_resched() is true and we'll take the loop in
 	 * schedule(). The next pick is obviously going to be the stop task
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2420,9 +2420,11 @@ static void pull_dl_task(struct rq *this
 		double_unlock_balance(this_rq, src_rq);
 
 		if (push_task) {
+			preempt_disable();
 			raw_spin_rq_unlock(this_rq);
 			stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
 					    push_task, &src_rq->push_work);
+			preempt_enable();
 			raw_spin_rq_lock(this_rq);
 		}
 	}
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11299,13 +11299,15 @@ static int load_balance(int this_cpu, st
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
 			}
-			raw_spin_rq_unlock_irqrestore(busiest, flags);
 
+			preempt_disable();
+			raw_spin_rq_unlock_irqrestore(busiest, flags);
 			if (active_balance) {
 				stop_one_cpu_nowait(cpu_of(busiest),
 					active_load_balance_cpu_stop, busiest,
 					&busiest->active_balance_work);
 			}
+			preempt_enable();
 		}
 	} else {
 		sd->nr_balance_failed = 0;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
  2023-10-10 20:04             ` Peter Zijlstra
@ 2023-10-11  3:24               ` Kuyo Chang (張建文)
  -1 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-11  3:24 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, 2023-10-10 at 22:04 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> 
> > > It is running good so far(more than a week)on hotplug/set
> affinity
> > > stress test. I will keep it testing and report back if it happens
> > > again.
> > 
> > OK, I suppose I should look at writing a coherent Changelog for
> this
> > then...
> 
> Something like the below... ?
> 
Thanks for illustrate the race scenario. It looks good to me.
But how about RT? Does RT also need this invocations as below?

---
 kernel/sched/rt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e93b69ef919b..6aaf0a3d6081 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2063,9 +2063,11 @@ static int push_rt_task(struct rq *rq, bool
pull)
                 */
                push_task = get_push_task(rq);
                if (push_task) {
+                       preempt_disable();
                        raw_spin_rq_unlock(rq);
                        stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
                                            push_task, &rq->push_work);
+                       preempt_enable();
                        raw_spin_rq_lock(rq);
                }

@@ -2402,9 +2404,11 @@ static void pull_rt_task(struct rq *this_rq)
                double_unlock_balance(this_rq, src_rq);

                if (push_task) {
+                       preempt_disable();
                        raw_spin_rq_unlock(this_rq);
                        stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
                                            push_task, &src_rq-
>push_work);
+                       preempt_enable();
                        raw_spin_rq_lock(this_rq);
                }
        }

> ---
> Subject: sched: Fix stop_one_cpu_nowait() vs hotplug
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Oct 10 20:57:39 CEST 2023
> 
> Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
> hotplug stress-test -- notably affine_move_task() remains stuck in
> wait_for_completion(), leading to a hung-task detector warning.
> 
> Specifically, it was reported that stop_one_cpu_nowait(.fn =
> migration_cpu_stop) returns false -- this stopper is responsible for
> the matching complete().
> 
> The race scenario is:
> 
> CPU0CPU1
> 
> // doing _cpu_down()
> 
>   __set_cpus_allowed_ptr()
>     task_rq_lock();
> takedown_cpu()
>   stop_machine_cpuslocked(take_cpu_down..)
> 
> <PREEMPT: cpu_stopper_thread()
>   MULTI_STOP_PREPARE
>   ...
>     __set_cpus_allowed_ptr_locked()
>       affine_move_task()
>         task_rq_unlock();
> 
>   <PREEMPT: cpu_stopper_thread()\> 
>     ack_state()
>   MULTI_STOP_RUN
>     take_cpu_down()
>       __cpu_disable();
>       stop_machine_park();
> stopper->enabled = false;
>  />
>    />
> stop_one_cpu_nowait(.fn = migration_cpu_stop);
>           if (stopper->enabled) // false!!!
> 
> 
> That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
> stopper thread gets a chance to preempt and allows the cpu-down for
> the target CPU to complete.
> 
> OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
> issue a wakeup, it must not be ran under the scheduler locks.
> 
> Solve this apparent contradiction by keeping preemption disabled over
> the unlock + queue_stopper combination:
> 
> preempt_disable();
> task_rq_unlock(...);
> if (!stop_pending)
>   stop_one_cpu_nowait(...)
> preempt_enable();
> 
> This respects the lock ordering contraints while still avoiding the
> above race. That is, if we find the CPU is online under rq-lock, the
> targeted stop_one_cpu_nowait() must succeed.
> 
> Apply this pattern to all similar stop_one_cpu_nowait() invocations.
> 
> Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs
> set_cpus_allowed_ptr()")
> Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
> ---
>  kernel/sched/core.c     |   10 ++++++++--
>  kernel/sched/deadline.c |    2 ++
>  kernel/sched/fair.c     |    4 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *r
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *r
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,9 +9463,11 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> +preempt_enable();
>  /*
>   * At this point need_resched() is true and we'll take the loop in
>   * schedule(). The next pick is obviously going to be the stop task
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2420,9 +2420,11 @@ static void pull_dl_task(struct rq *this
>  double_unlock_balance(this_rq, src_rq);
>  
>  if (push_task) {
> +preempt_disable();
>  raw_spin_rq_unlock(this_rq);
>  stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
>      push_task, &src_rq->push_work);
> +preempt_enable();
>  raw_spin_rq_lock(this_rq);
>  }
>  }
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11299,13 +11299,15 @@ static int load_balance(int this_cpu, st
>  busiest->push_cpu = this_cpu;
>  active_balance = 1;
>  }
> -raw_spin_rq_unlock_irqrestore(busiest, flags);
>  
> +preempt_disable();
> +raw_spin_rq_unlock_irqrestore(busiest, flags);
>  if (active_balance) {
>  stop_one_cpu_nowait(cpu_of(busiest),
>  active_load_balance_cpu_stop, busiest,
>  &busiest->active_balance_work);
>  }
> +preempt_enable();
>  }
>  } else {
>  sd->nr_balance_failed = 0;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
@ 2023-10-11  3:24               ` Kuyo Chang (張建文)
  0 siblings, 0 replies; 25+ messages in thread
From: Kuyo Chang (張建文) @ 2023-10-11  3:24 UTC (permalink / raw)
  To: peterz
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Tue, 2023-10-10 at 22:04 +0200, Peter Zijlstra wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> 
> > > It is running good so far(more than a week)on hotplug/set
> affinity
> > > stress test. I will keep it testing and report back if it happens
> > > again.
> > 
> > OK, I suppose I should look at writing a coherent Changelog for
> this
> > then...
> 
> Something like the below... ?
> 
Thanks for illustrate the race scenario. It looks good to me.
But how about RT? Does RT also need this invocations as below?

---
 kernel/sched/rt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e93b69ef919b..6aaf0a3d6081 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2063,9 +2063,11 @@ static int push_rt_task(struct rq *rq, bool
pull)
                 */
                push_task = get_push_task(rq);
                if (push_task) {
+                       preempt_disable();
                        raw_spin_rq_unlock(rq);
                        stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
                                            push_task, &rq->push_work);
+                       preempt_enable();
                        raw_spin_rq_lock(rq);
                }

@@ -2402,9 +2404,11 @@ static void pull_rt_task(struct rq *this_rq)
                double_unlock_balance(this_rq, src_rq);

                if (push_task) {
+                       preempt_disable();
                        raw_spin_rq_unlock(this_rq);
                        stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
                                            push_task, &src_rq-
>push_work);
+                       preempt_enable();
                        raw_spin_rq_lock(this_rq);
                }
        }

> ---
> Subject: sched: Fix stop_one_cpu_nowait() vs hotplug
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Oct 10 20:57:39 CEST 2023
> 
> Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
> hotplug stress-test -- notably affine_move_task() remains stuck in
> wait_for_completion(), leading to a hung-task detector warning.
> 
> Specifically, it was reported that stop_one_cpu_nowait(.fn =
> migration_cpu_stop) returns false -- this stopper is responsible for
> the matching complete().
> 
> The race scenario is:
> 
> CPU0CPU1
> 
> // doing _cpu_down()
> 
>   __set_cpus_allowed_ptr()
>     task_rq_lock();
> takedown_cpu()
>   stop_machine_cpuslocked(take_cpu_down..)
> 
> <PREEMPT: cpu_stopper_thread()
>   MULTI_STOP_PREPARE
>   ...
>     __set_cpus_allowed_ptr_locked()
>       affine_move_task()
>         task_rq_unlock();
> 
>   <PREEMPT: cpu_stopper_thread()\> 
>     ack_state()
>   MULTI_STOP_RUN
>     take_cpu_down()
>       __cpu_disable();
>       stop_machine_park();
> stopper->enabled = false;
>  />
>    />
> stop_one_cpu_nowait(.fn = migration_cpu_stop);
>           if (stopper->enabled) // false!!!
> 
> 
> That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
> stopper thread gets a chance to preempt and allows the cpu-down for
> the target CPU to complete.
> 
> OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
> issue a wakeup, it must not be ran under the scheduler locks.
> 
> Solve this apparent contradiction by keeping preemption disabled over
> the unlock + queue_stopper combination:
> 
> preempt_disable();
> task_rq_unlock(...);
> if (!stop_pending)
>   stop_one_cpu_nowait(...)
> preempt_enable();
> 
> This respects the lock ordering contraints while still avoiding the
> above race. That is, if we find the CPU is online under rq-lock, the
> targeted stop_one_cpu_nowait() must succeed.
> 
> Apply this pattern to all similar stop_one_cpu_nowait() invocations.
> 
> Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs
> set_cpus_allowed_ptr()")
> Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
> ---
>  kernel/sched/core.c     |   10 ++++++++--
>  kernel/sched/deadline.c |    2 ++
>  kernel/sched/fair.c     |    4 +++-
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data
>   * it.
>   */
>  WARN_ON_ONCE(!pending->stop_pending);
> +preempt_disable();
>  task_rq_unlock(rq, p, &rf);
>  stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
> +preempt_enable();
>  return 0;
>  }
>  out:
> @@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *r
>  complete = true;
>  }
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (push_task) {
>  stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>      p, &rq->push_work);
>  }
> +preempt_enable();
>  
>  if (complete)
>  complete_all(&pending->done);
> @@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *r
>  if (flags & SCA_MIGRATE_ENABLE)
>  p->migration_flags &= ~MDF_PUSH;
>  
> +preempt_disable();
>  task_rq_unlock(rq, p, rf);
> -
>  if (!stop_pending) {
>  stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
>      &pending->arg, &pending->stop_work);
>  }
> +preempt_enable();
>  
>  if (flags & SCA_MIGRATE_ENABLE)
>  return 0;
> @@ -9459,9 +9463,11 @@ static void balance_push(struct rq *rq)
>   * Temporarily drop rq->lock such that we can wake-up the stop task.
>   * Both preemption and IRQs are still disabled.
>   */
> +preempt_disable();
>  raw_spin_rq_unlock(rq);
>  stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
>      this_cpu_ptr(&push_work));
> +preempt_enable();
>  /*
>   * At this point need_resched() is true and we'll take the loop in
>   * schedule(). The next pick is obviously going to be the stop task
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2420,9 +2420,11 @@ static void pull_dl_task(struct rq *this
>  double_unlock_balance(this_rq, src_rq);
>  
>  if (push_task) {
> +preempt_disable();
>  raw_spin_rq_unlock(this_rq);
>  stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
>      push_task, &src_rq->push_work);
> +preempt_enable();
>  raw_spin_rq_lock(this_rq);
>  }
>  }
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11299,13 +11299,15 @@ static int load_balance(int this_cpu, st
>  busiest->push_cpu = this_cpu;
>  active_balance = 1;
>  }
> -raw_spin_rq_unlock_irqrestore(busiest, flags);
>  
> +preempt_disable();
> +raw_spin_rq_unlock_irqrestore(busiest, flags);
>  if (active_balance) {
>  stop_one_cpu_nowait(cpu_of(busiest),
>  active_load_balance_cpu_stop, busiest,
>  &busiest->active_balance_work);
>  }
> +preempt_enable();
>  }
>  } else {
>  sd->nr_balance_failed = 0;
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
  2023-10-11  3:24               ` Kuyo Chang (張建文)
@ 2023-10-11 13:26                 ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-11 13:26 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Oct 11, 2023 at 03:24:19AM +0000, Kuyo Chang (張建文) wrote:
> On Tue, 2023-10-10 at 22:04 +0200, Peter Zijlstra wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> > 
> > > > It is running good so far(more than a week)on hotplug/set
> > affinity
> > > > stress test. I will keep it testing and report back if it happens
> > > > again.
> > > 
> > > OK, I suppose I should look at writing a coherent Changelog for
> > this
> > > then...
> > 
> > Something like the below... ?
> > 
> Thanks for illustrate the race scenario. It looks good to me.
> But how about RT? Does RT also need this invocations as below?
> 
> ---
>  kernel/sched/rt.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index e93b69ef919b..6aaf0a3d6081 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2063,9 +2063,11 @@ static int push_rt_task(struct rq *rq, bool
> pull)
>                  */
>                 push_task = get_push_task(rq);
>                 if (push_task) {
> +                       preempt_disable();
>                         raw_spin_rq_unlock(rq);
>                         stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>                                             push_task, &rq->push_work);
> +                       preempt_enable();
>                         raw_spin_rq_lock(rq);
>                 }
> 
> @@ -2402,9 +2404,11 @@ static void pull_rt_task(struct rq *this_rq)
>                 double_unlock_balance(this_rq, src_rq);
> 
>                 if (push_task) {
> +                       preempt_disable();
>                         raw_spin_rq_unlock(this_rq);
>                         stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
>                                             push_task, &src_rq-
> >push_work);
> +                       preempt_enable();
>                         raw_spin_rq_lock(this_rq);
>                 }
>         }

bah, clearly git-grep didn't work for me last night, I'll go fix up.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug
@ 2023-10-11 13:26                 ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-10-11 13:26 UTC (permalink / raw)
  To: Kuyo Chang (張建文)
  Cc: dietmar.eggemann, linux-kernel, linux-mediatek, rostedt,
	wsd_upstream, vschneid, bristot, juri.lelli, mingo,
	linux-arm-kernel, bsegall, mgorman, matthias.bgg,
	vincent.guittot, angelogioacchino.delregno

On Wed, Oct 11, 2023 at 03:24:19AM +0000, Kuyo Chang (張建文) wrote:
> On Tue, 2023-10-10 at 22:04 +0200, Peter Zijlstra wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Tue, Oct 10, 2023 at 04:57:47PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 10, 2023 at 02:40:22PM +0000, Kuyo Chang (張建文) wrote:
> > 
> > > > It is running good so far(more than a week)on hotplug/set
> > affinity
> > > > stress test. I will keep it testing and report back if it happens
> > > > again.
> > > 
> > > OK, I suppose I should look at writing a coherent Changelog for
> > this
> > > then...
> > 
> > Something like the below... ?
> > 
> Thanks for illustrate the race scenario. It looks good to me.
> But how about RT? Does RT also need this invocations as below?
> 
> ---
>  kernel/sched/rt.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index e93b69ef919b..6aaf0a3d6081 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2063,9 +2063,11 @@ static int push_rt_task(struct rq *rq, bool
> pull)
>                  */
>                 push_task = get_push_task(rq);
>                 if (push_task) {
> +                       preempt_disable();
>                         raw_spin_rq_unlock(rq);
>                         stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
>                                             push_task, &rq->push_work);
> +                       preempt_enable();
>                         raw_spin_rq_lock(rq);
>                 }
> 
> @@ -2402,9 +2404,11 @@ static void pull_rt_task(struct rq *this_rq)
>                 double_unlock_balance(this_rq, src_rq);
> 
>                 if (push_task) {
> +                       preempt_disable();
>                         raw_spin_rq_unlock(this_rq);
>                         stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
>                                             push_task, &src_rq-
> >push_work);
> +                       preempt_enable();
>                         raw_spin_rq_lock(this_rq);
>                 }
>         }

bah, clearly git-grep didn't work for me last night, I'll go fix up.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip: sched/core] sched: Fix stop_one_cpu_nowait() vs hotplug
  2023-10-10 20:04             ` Peter Zijlstra
  (?)
  (?)
@ 2023-10-13  8:06             ` tip-bot2 for Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2023-10-13  8:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kuyo.Chang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     f0498d2a54e7966ce23cd7c7ff42c64fa0059b07
Gitweb:        https://git.kernel.org/tip/f0498d2a54e7966ce23cd7c7ff42c64fa0059b07
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 10 Oct 2023 20:57:39 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 13 Oct 2023 09:56:29 +02:00

sched: Fix stop_one_cpu_nowait() vs hotplug

Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
hotplug stress-test -- notably affine_move_task() remains stuck in
wait_for_completion(), leading to a hung-task detector warning.

Specifically, it was reported that stop_one_cpu_nowait(.fn =
migration_cpu_stop) returns false -- this stopper is responsible for
the matching complete().

The race scenario is:

	CPU0					CPU1

					// doing _cpu_down()

  __set_cpus_allowed_ptr()
    task_rq_lock();
					takedown_cpu()
					  stop_machine_cpuslocked(take_cpu_down..)

					<PREEMPT: cpu_stopper_thread()
					  MULTI_STOP_PREPARE
					  ...
    __set_cpus_allowed_ptr_locked()
      affine_move_task()
        task_rq_unlock();

  <PREEMPT: cpu_stopper_thread()\>
    ack_state()
					  MULTI_STOP_RUN
					    take_cpu_down()
					      __cpu_disable();
					      stop_machine_park();
						stopper->enabled = false;
					 />
   />
	stop_one_cpu_nowait(.fn = migration_cpu_stop);
          if (stopper->enabled) // false!!!

That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
stopper thread gets a chance to preempt and allows the cpu-down for
the target CPU to complete.

OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
issue a wakeup, it must not be ran under the scheduler locks.

Solve this apparent contradiction by keeping preemption disabled over
the unlock + queue_stopper combination:

	preempt_disable();
	task_rq_unlock(...);
	if (!stop_pending)
	  stop_one_cpu_nowait(...)
	preempt_enable();

This respects the lock ordering contraints while still avoiding the
above race. That is, if we find the CPU is online under rq-lock, the
targeted stop_one_cpu_nowait() must succeed.

Apply this pattern to all similar stop_one_cpu_nowait() invocations.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com>
Link: https://lkml.kernel.org/r/20231010200442.GA16515@noisy.programming.kicks-ass.net
---
 kernel/sched/core.c     | 10 ++++++++--
 kernel/sched/deadline.c |  2 ++
 kernel/sched/fair.c     |  4 +++-
 kernel/sched/rt.c       |  4 ++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a3f9cd5..264c2eb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2645,9 +2645,11 @@ static int migration_cpu_stop(void *data)
 		 * it.
 		 */
 		WARN_ON_ONCE(!pending->stop_pending);
+		preempt_disable();
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
+		preempt_enable();
 		return 0;
 	}
 out:
@@ -2967,12 +2969,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			complete = true;
 		}
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (push_task) {
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    p, &rq->push_work);
 		}
+		preempt_enable();
 
 		if (complete)
 			complete_all(&pending->done);
@@ -3038,12 +3041,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 
+		preempt_disable();
 		task_rq_unlock(rq, p, rf);
-
 		if (!stop_pending) {
 			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 					    &pending->arg, &pending->stop_work);
 		}
+		preempt_enable();
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;
@@ -9421,9 +9425,11 @@ static void balance_push(struct rq *rq)
 	 * Temporarily drop rq->lock such that we can wake-up the stop task.
 	 * Both preemption and IRQs are still disabled.
 	 */
+	preempt_disable();
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
 			    this_cpu_ptr(&push_work));
+	preempt_enable();
 	/*
 	 * At this point need_resched() is true and we'll take the loop in
 	 * schedule(). The next pick is obviously going to be the stop task
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7039a8d..b281144 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2420,9 +2420,11 @@ skip:
 		double_unlock_balance(this_rq, src_rq);
 
 		if (push_task) {
+			preempt_disable();
 			raw_spin_rq_unlock(this_rq);
 			stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
 					    push_task, &src_rq->push_work);
+			preempt_enable();
 			raw_spin_rq_lock(this_rq);
 		}
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a751e55..38d757c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11254,13 +11254,15 @@ more_balance:
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
 			}
-			raw_spin_rq_unlock_irqrestore(busiest, flags);
 
+			preempt_disable();
+			raw_spin_rq_unlock_irqrestore(busiest, flags);
 			if (active_balance) {
 				stop_one_cpu_nowait(cpu_of(busiest),
 					active_load_balance_cpu_stop, busiest,
 					&busiest->active_balance_work);
 			}
+			preempt_enable();
 		}
 	} else {
 		sd->nr_balance_failed = 0;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e93b69e..6aaf0a3 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2063,9 +2063,11 @@ retry:
 		 */
 		push_task = get_push_task(rq);
 		if (push_task) {
+			preempt_disable();
 			raw_spin_rq_unlock(rq);
 			stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
 					    push_task, &rq->push_work);
+			preempt_enable();
 			raw_spin_rq_lock(rq);
 		}
 
@@ -2402,9 +2404,11 @@ skip:
 		double_unlock_balance(this_rq, src_rq);
 
 		if (push_task) {
+			preempt_disable();
 			raw_spin_rq_unlock(this_rq);
 			stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop,
 					    push_task, &src_rq->push_work);
+			preempt_enable();
 			raw_spin_rq_lock(this_rq);
 		}
 	}

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-10-13  8:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27  3:34 [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable Kuyo Chang
2023-09-27  3:34 ` Kuyo Chang
2023-09-27  8:08 ` Peter Zijlstra
2023-09-27  8:08   ` Peter Zijlstra
2023-09-27 15:57   ` Kuyo Chang (張建文)
2023-09-27 15:57     ` Kuyo Chang (張建文)
2023-09-28 15:16     ` Peter Zijlstra
2023-09-28 15:16       ` Peter Zijlstra
2023-09-28 15:19       ` Peter Zijlstra
2023-09-28 15:19         ` Peter Zijlstra
2023-09-29 10:21     ` Peter Zijlstra
2023-09-29 10:21       ` Peter Zijlstra
2023-10-01 15:15       ` Kuyo Chang (張建文)
2023-10-01 15:15         ` Kuyo Chang (張建文)
2023-10-10 14:40       ` Kuyo Chang (張建文)
2023-10-10 14:40         ` Kuyo Chang (張建文)
2023-10-10 14:57         ` Peter Zijlstra
2023-10-10 14:57           ` Peter Zijlstra
2023-10-10 20:04           ` [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug Peter Zijlstra
2023-10-10 20:04             ` Peter Zijlstra
2023-10-11  3:24             ` Kuyo Chang (張建文)
2023-10-11  3:24               ` Kuyo Chang (張建文)
2023-10-11 13:26               ` Peter Zijlstra
2023-10-11 13:26                 ` Peter Zijlstra
2023-10-13  8:06             ` [tip: sched/core] " tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.