From: Peter Zijlstra <peterz@infradead.org> To: Kuyo Chang <kuyo.chang@mediatek.com> Cc: Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Valentin Schneider <vschneid@redhat.com>, Matthias Brugger <matthias.bgg@gmail.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>, wsd_upstream@mediatek.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Subject: Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable Date: Wed, 27 Sep 2023 10:08:50 +0200 [thread overview] Message-ID: <20230927080850.GB21824@noisy.programming.kicks-ass.net> (raw) In-Reply-To: <20230927033431.12406-1-kuyo.chang@mediatek.com> On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote: > From: kuyo chang <kuyo.chang@mediatek.com> > > [Syndrome] hung detect shows below warning msg > [ 4320.666557] [ T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds. > [ 4320.666589] [ T56] khungtaskd: [name:core&]task:stressapptest state:D stack:0 pid:17803 ppid:17579 flags:0x04000008 > [ 4320.666601] [ T56] khungtaskd: Call trace: > [ 4320.666607] [ T56] khungtaskd: __switch_to+0x17c/0x338 > [ 4320.666642] [ T56] khungtaskd: __schedule+0x54c/0x8ec > [ 4320.666651] [ T56] khungtaskd: schedule+0x74/0xd4 > [ 4320.666656] [ T56] khungtaskd: schedule_timeout+0x34/0x108 > [ 4320.666672] [ T56] khungtaskd: do_wait_for_common+0xe0/0x154 > [ 4320.666678] [ T56] khungtaskd: wait_for_completion+0x44/0x58 > [ 4320.666681] [ T56] khungtaskd: __set_cpus_allowed_ptr_locked+0x344/0x730 > [ 4320.666702] [ T56] khungtaskd: __sched_setaffinity+0x118/0x160 > [ 4320.666709] [ T56] khungtaskd: sched_setaffinity+0x10c/0x248 > [ 4320.666715] [ T56] khungtaskd: __arm64_sys_sched_setaffinity+0x15c/0x1c0 > [ 4320.666719] [ T56] khungtaskd: invoke_syscall+0x3c/0xf8 > [ 4320.666743] [ T56] khungtaskd: el0_svc_common+0xb0/0xe8 > [ 4320.666749] [ T56] khungtaskd: do_el0_svc+0x28/0xa8 > [ 4320.666755] [ T56] khungtaskd: el0_svc+0x28/0x9c > [ 4320.666761] [ T56] khungtaskd: el0t_64_sync_handler+0x7c/0xe4 > [ 4320.666766] [ T56] khungtaskd: el0t_64_sync+0x18c/0x190 > > [Analysis] > > After add some debug footprint massage, this issue happened at stopper > disable case. > It cannot exec migration_cpu_stop fun to complete migration. > This will cause stuck on wait_for_completion. How did you get in this situation? > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > --- > kernel/sched/core.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1dc0b0287e30..98c217a1caa0 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag > task_rq_unlock(rq, p, rf); > > if (!stop_pending) { > - stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, > - &pending->arg, &pending->stop_work); > + if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, > + &pending->arg, &pending->stop_work)) > + return -ENOENT; And -ENOENT is the right return code for when the target CPU is not available? I suspect you're missing more than halp the picture and this is a band-aid solution at best. Please try harder. > } > > if (flags & SCA_MIGRATE_ENABLE) > -- > 2.18.0 >
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org> To: Kuyo Chang <kuyo.chang@mediatek.com> Cc: Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Valentin Schneider <vschneid@redhat.com>, Matthias Brugger <matthias.bgg@gmail.com>, AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>, wsd_upstream@mediatek.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org Subject: Re: [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable Date: Wed, 27 Sep 2023 10:08:50 +0200 [thread overview] Message-ID: <20230927080850.GB21824@noisy.programming.kicks-ass.net> (raw) In-Reply-To: <20230927033431.12406-1-kuyo.chang@mediatek.com> On Wed, Sep 27, 2023 at 11:34:28AM +0800, Kuyo Chang wrote: > From: kuyo chang <kuyo.chang@mediatek.com> > > [Syndrome] hung detect shows below warning msg > [ 4320.666557] [ T56] khungtaskd: [name:hung_task&]INFO: task stressapptest:17803 blocked for more than 3600 seconds. > [ 4320.666589] [ T56] khungtaskd: [name:core&]task:stressapptest state:D stack:0 pid:17803 ppid:17579 flags:0x04000008 > [ 4320.666601] [ T56] khungtaskd: Call trace: > [ 4320.666607] [ T56] khungtaskd: __switch_to+0x17c/0x338 > [ 4320.666642] [ T56] khungtaskd: __schedule+0x54c/0x8ec > [ 4320.666651] [ T56] khungtaskd: schedule+0x74/0xd4 > [ 4320.666656] [ T56] khungtaskd: schedule_timeout+0x34/0x108 > [ 4320.666672] [ T56] khungtaskd: do_wait_for_common+0xe0/0x154 > [ 4320.666678] [ T56] khungtaskd: wait_for_completion+0x44/0x58 > [ 4320.666681] [ T56] khungtaskd: __set_cpus_allowed_ptr_locked+0x344/0x730 > [ 4320.666702] [ T56] khungtaskd: __sched_setaffinity+0x118/0x160 > [ 4320.666709] [ T56] khungtaskd: sched_setaffinity+0x10c/0x248 > [ 4320.666715] [ T56] khungtaskd: __arm64_sys_sched_setaffinity+0x15c/0x1c0 > [ 4320.666719] [ T56] khungtaskd: invoke_syscall+0x3c/0xf8 > [ 4320.666743] [ T56] khungtaskd: el0_svc_common+0xb0/0xe8 > [ 4320.666749] [ T56] khungtaskd: do_el0_svc+0x28/0xa8 > [ 4320.666755] [ T56] khungtaskd: el0_svc+0x28/0x9c > [ 4320.666761] [ T56] khungtaskd: el0t_64_sync_handler+0x7c/0xe4 > [ 4320.666766] [ T56] khungtaskd: el0t_64_sync+0x18c/0x190 > > [Analysis] > > After add some debug footprint massage, this issue happened at stopper > disable case. > It cannot exec migration_cpu_stop fun to complete migration. > This will cause stuck on wait_for_completion. How did you get in this situation? > Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> > --- > kernel/sched/core.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1dc0b0287e30..98c217a1caa0 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3041,8 +3041,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag > task_rq_unlock(rq, p, rf); > > if (!stop_pending) { > - stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, > - &pending->arg, &pending->stop_work); > + if (!stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, > + &pending->arg, &pending->stop_work)) > + return -ENOENT; And -ENOENT is the right return code for when the target CPU is not available? I suspect you're missing more than halp the picture and this is a band-aid solution at best. Please try harder. > } > > if (flags & SCA_MIGRATE_ENABLE) > -- > 2.18.0 > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-09-27 8:09 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-09-27 3:34 [PATCH 1/1] sched/core: Fix stuck on completion for affine_move_task() when stopper disable Kuyo Chang 2023-09-27 3:34 ` Kuyo Chang 2023-09-27 8:08 ` Peter Zijlstra [this message] 2023-09-27 8:08 ` Peter Zijlstra 2023-09-27 15:57 ` Kuyo Chang (張建文) 2023-09-27 15:57 ` Kuyo Chang (張建文) 2023-09-28 15:16 ` Peter Zijlstra 2023-09-28 15:16 ` Peter Zijlstra 2023-09-28 15:19 ` Peter Zijlstra 2023-09-28 15:19 ` Peter Zijlstra 2023-09-29 10:21 ` Peter Zijlstra 2023-09-29 10:21 ` Peter Zijlstra 2023-10-01 15:15 ` Kuyo Chang (張建文) 2023-10-01 15:15 ` Kuyo Chang (張建文) 2023-10-10 14:40 ` Kuyo Chang (張建文) 2023-10-10 14:40 ` Kuyo Chang (張建文) 2023-10-10 14:57 ` Peter Zijlstra 2023-10-10 14:57 ` Peter Zijlstra 2023-10-10 20:04 ` [PATCH] sched: Fix stop_one_cpu_nowait() vs hotplug Peter Zijlstra 2023-10-10 20:04 ` Peter Zijlstra 2023-10-11 3:24 ` Kuyo Chang (張建文) 2023-10-11 3:24 ` Kuyo Chang (張建文) 2023-10-11 13:26 ` Peter Zijlstra 2023-10-11 13:26 ` Peter Zijlstra 2023-10-13 8:06 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20230927080850.GB21824@noisy.programming.kicks-ass.net \ --to=peterz@infradead.org \ --cc=angelogioacchino.delregno@collabora.com \ --cc=bristot@redhat.com \ --cc=bsegall@google.com \ --cc=dietmar.eggemann@arm.com \ --cc=juri.lelli@redhat.com \ --cc=kuyo.chang@mediatek.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mediatek@lists.infradead.org \ --cc=matthias.bgg@gmail.com \ --cc=mgorman@suse.de \ --cc=mingo@redhat.com \ --cc=rostedt@goodmis.org \ --cc=vincent.guittot@linaro.org \ --cc=vschneid@redhat.com \ --cc=wsd_upstream@mediatek.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.