All of lore.kernel.org
 help / color / mirror / Atom feed
* [WARNING] tick_handle_oneshot_broadcast() on !online CPU
@ 2019-03-21 12:40 Valentin Schneider
  2019-03-21 15:39 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Valentin Schneider @ 2019-03-21 12:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar

Hi,

I hit this while running some hotplug stress tests on my HiKey960 with:

  babf09c3837f ("Merge tag 'tag-chrome-platform-fixes-for-v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux")

[   13.181753] WARNING: CPU: 4 PID: 29 at kernel/time/tick-broadcast.c:647 tick_handle_oneshot_broadcast+0x1b4/0x1d0
[   13.192342] Modules linked in:
[   13.195428] CPU: 4 PID: 29 Comm: migration/4 Tainted: G S                5.1.0-rc1-00015-gbabf09c3837f #23
[   13.205140] Hardware name: HiKey960 (DT)
[   13.209089] pstate: 00000085 (nzcv daIf -PAN -UAO)
[   13.213918] pc : tick_handle_oneshot_broadcast+0x1b4/0x1d0
[   13.219440] lr : tick_handle_oneshot_broadcast+0x148/0x1d0
[   13.224958] sp : ffff000010023b50
[   13.228294] x29: ffff000010023b50 x28: ffff8000ba12a000
[   13.233642] x27: ffff000012352c70 x26: ffff000011706e80
[   13.238989] x25: ffff0000114fa000 x24: 00000002fdabdc71
[   13.244335] x23: ffff000012352ce0 x22: ffff000011504080
[   13.249681] x21: ffff000012352cb0 x20: 0000000000000000
[   13.255027] x19: 7fffffffffffffff x18: 0000000000000000
[   13.260373] x17: 0000000000000000 x16: 0000000000000000
[   13.265719] x15: 000000000000043b x14: 0000000000000000
[   13.271064] x13: 0000000000000000 x12: ffff00001176e9f8
[   13.276409] x11: 00000000000000f3 x10: 0000000000000040
[   13.281755] x9 : 0000000000000000 x8 : ffff000011503808
[   13.287101] x7 : ffff000011506000 x6 : 0000000000000004
[   13.292446] x5 : 0000000000000020 x4 : 0000000000000010
[   13.297792] x3 : 0000000000000000 x2 : 0000000000000100
[   13.303138] x1 : ffff000011504080 x0 : 0000000000000000
[   13.308485] Call trace:
[   13.310951]  tick_handle_oneshot_broadcast+0x1b4/0x1d0
[   13.316130]  sp804_timer_interrupt+0x30/0x40
[   13.320432]  __handle_irq_event_percpu+0xb8/0x460
[   13.325165]  handle_irq_event_percpu+0x34/0x88
[   13.329636]  handle_irq_event+0x48/0x78
[   13.333501]  handle_fasteoi_irq+0xb4/0x188
[   13.337626]  generic_handle_irq+0x24/0x38
[   13.341661]  __handle_domain_irq+0x5c/0xb0
[   13.345789]  gic_handle_irq+0x58/0xa8
[   13.349475]  el1_irq+0xbc/0x180
[   13.352636]  __do_softirq+0xcc/0x574
[   13.356239]  irq_exit+0x13c/0x148
[   13.359577]  __handle_domain_irq+0x60/0xb0
[   13.363698]  gic_handle_irq+0x58/0xa8
[   13.367383]  el1_irq+0xbc/0x180
[   13.370549]  multi_cpu_stop+0xf8/0x170
[   13.374323]  cpu_stopper_thread+0xac/0x120
[   13.378451]  smpboot_thread_fn+0x16c/0x270
[   13.382575]  kthread+0x11c/0x120
[   13.385826]  ret_from_fork+0x10/0x18
[   13.389424] irq event stamp: 533
[   13.392676] hardirqs last  enabled at (532): [<ffff000010081c74>] __do_softirq+0xc4/0x574
[   13.400907] hardirqs last disabled at (533): [<ffff000010083c7c>] el1_irq+0x7c/0x180
[   13.408702] softirqs last  enabled at (526): [<ffff000010082024>] __do_softirq+0x474/0x574
[   13.417021] softirqs last disabled at (531): [<ffff0000100f276c>] irq_exit+0x13c/0x148
[   13.424987] ---[ end trace c9d39afcc503fe70 ]---
[   13.429773] CPU2: shutdown
[   13.437515] psci: CPU2 killed.

I stared at the code and did a bit of tracing, the sequence seems to be:

---
echo 0 > /sys/devices/system/cpu/cpu2/online

takedown_cpu()
  take_cpu_down()
     __cpu_disable() (clears CPU in cpu_online_mask & cpu_active_mask)

  [...] <warn happens here>

  tick_cleanup_dead_cpu()
    tick_shutdown_broadcast_oneshot() (removes cpu from the tick_broadcast_* masks)
---

In that case we always have

  tick_broadcast_force_mask=[CPU2]
  cpu_online_mask=[CPU4]

So tick_handle_oneshot_broadcast::tmpmask becomes [CPU2] and we hit the warn.



I was thinking of guarding the setting of tmpmask with cpu_online(cpu), but
AFAICT nothing saves us from __cpu_disable() happening *after* those checks
(and even potentially after the WARN_ON_ONCE). Do we want some extra lock
in here, or is that a benign issue?



Appended is my reproducer - it comes from a longer script that serves as a
hotplug torture test, but these few lines are enough to trigger the above.

-----8<-----

#!/bin/bash

echo 0 > /sys/devices/system/cpu/cpu1/online
sleep 0.085
echo 0 > /sys/devices/system/cpu/cpu5/online
sleep 0.08
echo 0 > /sys/devices/system/cpu/cpu2/online
sleep 0.018
echo 0 > /sys/devices/system/cpu/cpu3/online
sleep 0.05
echo 0 > /sys/devices/system/cpu/cpu0/online
sleep 0.049
echo 1 > /sys/devices/system/cpu/cpu2/online
sleep 0.064
echo 0 > /sys/devices/system/cpu/cpu6/online
sleep 0.037
echo 0 > /sys/devices/system/cpu/cpu7/online
sleep 0.033
echo 0 > /sys/devices/system/cpu/cpu2/online
sleep 0.025
echo 1 > /sys/devices/system/cpu/cpu2/online
sleep 0.018
echo 1 > /sys/devices/system/cpu/cpu7/online
sleep 0.069
echo 0 > /sys/devices/system/cpu/cpu7/online
sleep 0.036
echo 0 > /sys/devices/system/cpu/cpu4/online
sleep 0.06
echo 1 > /sys/devices/system/cpu/cpu3/online
sleep 0.067
echo 1 > /sys/devices/system/cpu/cpu4/online
sleep 0.055
echo 1 > /sys/devices/system/cpu/cpu1/online
sleep 0.052
echo 1 > /sys/devices/system/cpu/cpu6/online
sleep 0.035
echo 1 > /sys/devices/system/cpu/cpu7/online
sleep 0.053

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-23 17:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 12:40 [WARNING] tick_handle_oneshot_broadcast() on !online CPU Valentin Schneider
2019-03-21 15:39 ` Thomas Gleixner
2019-03-21 16:31   ` Valentin Schneider
2019-03-23 11:25   ` [tip:timers/core] tick: Remove outgoing CPU from broadcast masks tip-bot for Thomas Gleixner
2019-03-23 17:28   ` tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.