* [PATCH] kernel: cpu: Handle hotplug failure for state CPUHP_AP_IDLE_DEAD
@ 2018-09-05 19:15 Prakruthi Deepak Heragu
2018-09-06 8:10 ` Thomas Gleixner
0 siblings, 1 reply; 2+ messages in thread
From: Prakruthi Deepak Heragu @ 2018-09-05 19:15 UTC (permalink / raw)
To: tglx
Cc: linux-kernel, tsoni, ckadabi, bryanh, psodagud, Prakruthi Deepak Heragu
Once the tear down hotplug handler is run, cpu is dead and enters
into CPUHP_AP_IDLE_DEAD state. Any callbacks that fail in the state
machine with state < CPUHP_AP_IDLE must be treated as fatal as this
could result into timer not beig migrated away from dead cpu and run
into issues like work queue lock ups, sched_clock timer wrapping to
zero as sched_clock_poll which is in the hrtimer base of cpu being
hotplugged does not get migrated.
The function sched_clock_poll() updates the epoch_ns and epoch_cyc. If
this function present in the hrtimer base of cpu being hotplugged
doesn't migrate, there is no update on the epoch_ns and epoch_cyc.
Subseqently, when sched_clock() is called, the non updated values of
epoch_ns and epoch_cyc are obtained which looks like the timer wrapped
around.
[ 8792.168842] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=6801s workers=2 manager: 4884
[ 8792.168862] pool 16: cpus=0-7 flags=0x4 nice=0 hung=0s workers=34 idle: 4482 1390 1394 1396 4492 5442 5447 5445
[ 0.017714] Modules linked in: wlan(O)
[ 0.017733] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O 4.9.37+ #1
[ 0.017746] task: ffffffc1b05c8080 task.stack: ffffffc1b05c4000
As seen, the time rolls over to 0 after 8792.
Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
Signed-off-by: Prakruthi Deepak Heragu <pheragu@codeaurora.org>
---
kernel/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 0db8938..51fa38f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -837,6 +837,7 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
for (; st->state > target; st->state--) {
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
+ BUG_ON(ret && st->state < CPUHP_AP_IDLE_DEAD);
if (ret) {
st->target = prev_state;
undo_cpu_down(cpu, st);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] kernel: cpu: Handle hotplug failure for state CPUHP_AP_IDLE_DEAD
2018-09-05 19:15 [PATCH] kernel: cpu: Handle hotplug failure for state CPUHP_AP_IDLE_DEAD Prakruthi Deepak Heragu
@ 2018-09-06 8:10 ` Thomas Gleixner
0 siblings, 0 replies; 2+ messages in thread
From: Thomas Gleixner @ 2018-09-06 8:10 UTC (permalink / raw)
To: Prakruthi Deepak Heragu; +Cc: linux-kernel, tsoni, ckadabi, bryanh, psodagud
On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:
> Once the tear down hotplug handler is run, cpu is dead and enters
> into CPUHP_AP_IDLE_DEAD state. Any callbacks that fail in the state
> machine with state < CPUHP_AP_IDLE must be treated as fatal as this
> could result into timer not beig migrated away from dead cpu and run
> into issues like work queue lock ups, sched_clock timer wrapping to
> zero as sched_clock_poll which is in the hrtimer base of cpu being
> hotplugged does not get migrated.
BUG_ON() is the last resort when there is no other way out. And there is no
reason to treat such a failure as fatal unconditionally.
Why would any of those callback fail at all? And if that ever happens, then
we really can be smarter than just giving up.
Thanks,
tglx
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-09-06 8:10 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 19:15 [PATCH] kernel: cpu: Handle hotplug failure for state CPUHP_AP_IDLE_DEAD Prakruthi Deepak Heragu
2018-09-06 8:10 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.