LKML Archive on lore.kernel.org
 help / color / Atom feed
* Crashes in perf_event_ctx_lock_nested
@ 2017-10-30 22:45 Guenter Roeck
  2017-10-31 13:48 ` Peter Zijlstra
  2017-10-31 18:48 ` Crashes in perf_event_ctx_lock_nested Don Zickus
  0 siblings, 2 replies; 20+ messages in thread
From: Guenter Roeck @ 2017-10-30 22:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Don Zickus, Ingo Molnar

Hi Thomas,

we are seeing the following crash in v4.14-rc5/rc7 if CONFIG_HARDLOCKUP_DETECTOR
is enabled.

[    5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    5.915836]
==================================================================
[    5.917325] Unsafe core_pattern used with fs.suid_dumpable=2.
[    5.917325] Pipe handler or fully qualified core dump path required.
[    5.917325] Set kernel.core_pattern before fs.suid_dumpable.
[    5.924046] udevd[147]: starting version 225
[    5.948520] BUG: KASAN: null-ptr-deref in perf_event_ctx_lock_nested.isra.71+0x22/0x89
[    5.957380] Read of size 8 at addr 0000000000000208 by task watchdog/2/21
[    5.964973]
[    5.966646] CPU: 2 PID: 21 Comm: watchdog/2 Not tainted 4.14.0-rc7 #30
[    5.973947] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
[    5.982128] Call Trace:
[    5.984874]  dump_stack+0x4d/0x63
[    5.988585]  kasan_report+0x24b/0x295
[    5.992691]  ? watchdog_nmi_enable+0x12/0x12
[    5.997477]  __asan_load8+0x81/0x83
[    6.001388]  perf_event_ctx_lock_nested.isra.71+0x22/0x89
[    6.007437]  perf_event_enable+0xf/0x27
[    6.011737]  hardlockup_detector_perf_enable+0x3e/0x40
[    6.017493]  watchdog_nmi_enable+0xe/0x12
[    6.021990]  watchdog_enable+0x8c/0xc5
[    6.026195]  smpboot_thread_fn+0x27a/0x3c7
[    6.030788]  ? sort_range+0x22/0x22
[    6.034701]  kthread+0x221/0x231
[    6.038321]  ? kthread_flush_work+0x120/0x120
[    6.043204]  ret_from_fork+0x22/0x30
[    6.047207]
==================================================================
...
[    6.134561] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
[    6.143316] IP: perf_event_ctx_lock_nested.isra.71+0x22/0x89
[    6.149645] PGD 0 P4D 0 
[    6.152478] Oops: 0000 [#1] PREEMPT SMP KASAN
[    6.157350] Modules linked in:
[    6.160766] CPU: 2 PID: 21 Comm: watchdog/2 Tainted: G    B 4.14.0-rc7 #30
[    6.169422] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
[    6.177583] task: ffff8803eacd1100 task.stack: ffff8803eacf8000
[    6.184206] RIP: 0010:perf_event_ctx_lock_nested.isra.71+0x22/0x89
[    6.191118] RSP: 0018:ffff8803eacffe10 EFLAGS: 00010246
[    6.196962] RAX: 0000000000000296 RBX: 0000000000000000 RCX: ffffffffa52ee8a5
[    6.204941] RDX: d8ecf37b519af800 RSI: 0000000000000003 RDI: ffffffffa6274610
[    6.212911] RBP: ffff8803eacffe30 R08: dffffc0000000000 R09: 0000000000000000
[    6.220888] R10: ffffed007d59ffa9 R11: ffffc9000044c1a1 R12: 0000000000000000
[    6.228867] R13: ffff8803eacd1100 R14: 0000000000000208 R15: ffffffffa535ce54
[    6.231476] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode.  Opts: commit=600
[    6.237449] EXT4-fs (mmcblk0p8): mounted filesystem with ordered data mode.  Opts: (null)
[    6.255332] FS:  0000000000000000(0000) GS:ffff8803ed500000(0000) knlGS:0000000000000000
[    6.264384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.270812] CR2: 0000000000000208 CR3: 0000000430615001 CR4: 00000000003606e0
[    6.278789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.286761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.294741] Call Trace:
[    6.297480]  perf_event_enable+0xf/0x27
[    6.301771]  hardlockup_detector_perf_enable+0x3e/0x40
[    6.307515]  watchdog_nmi_enable+0xe/0x12 
[    6.311990]  watchdog_enable+0x8c/0xc5
[    6.316176]  smpboot_thread_fn+0x27a/0x3c7
[    6.320757]  ? sort_range+0x22/0x22
[    6.324650]  kthread+0x221/0x231
[    6.328251]  ? kthread_flush_work+0x120/0x120
[    6.333114]  ret_from_fork+0x22/0x30
[    6.337107] Code: a5 e8 70 58 f6 ff 5b 5d c3 55 48 89 e5 41 56 4c 8d b7 08 02
00 00 41 55 41 54 49 89 fc 53 e8 1a e9 f5 ff 4c 89 f7 e8 8d d3 07 00 <49> 8b 9c
24 08 02 00 00 31 d2 be 01 00 00 00 48 8d bb b0 00 00
[    6.358230] RIP: perf_event_ctx_lock_nested.isra.71+0x22/0x89 RSP: ffff8803eacffe10
[    6.366779] CR2: 0000000000000208
[    6.370477] ---[ end trace ff68e1917f0a2044 ]---
[    6.383531] Kernel panic - not syncing: Fatal exception
[    6.389640] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

The problem is a heisenbug - slight changes in the code, such as added logging,
can make it disappear.

I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
Here is the result:

[    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
[    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
[    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
[    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
[    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
[    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
[    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
[    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
[    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
[    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
[    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
[    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
[    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
[    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
[    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
[    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
[    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
[    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
[    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
[    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
[    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
[    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
[    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
[    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
[    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
[    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done

Looks like there are a number of problems: hardlockup_detector_event_create()
creates the event data structure even if it is already created, and
hardlockup_detector_perf_cleanup() runs unprotected and in parallel to
the enable/create functions.

ALso, the following message is seen twice.

[    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

I don't offer a proposed patch since I have no idea how to best solve the
problem.

Also, is the repeated enable/disable/cleanup as part of the normal boot
really necessary ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-30 22:45 Crashes in perf_event_ctx_lock_nested Guenter Roeck
@ 2017-10-31 13:48 ` Peter Zijlstra
  2017-10-31 17:16   ` Guenter Roeck
  2017-10-31 21:32   ` Thomas Gleixner
  2017-10-31 18:48 ` Crashes in perf_event_ctx_lock_nested Don Zickus
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-10-31 13:48 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Thomas Gleixner, linux-kernel, Don Zickus, Ingo Molnar

On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> Here is the result:
> 
> [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)

> [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> 
> Looks like there are a number of problems: hardlockup_detector_event_create()
> creates the event data structure even if it is already created, 

Right, that does look dodgy. And on its own should be fairly straight
forward to cure. But I'd like to understand the rest of it first.

> and hardlockup_detector_perf_cleanup() runs unprotected and in
> parallel to the enable/create functions.

Well, looking at the code, cpu_maps_update_begin() aka.
cpu_add_remove_lock is serializing cpu_up() and cpu_down() and _should_
thereby also serialize cleanup vs the smp_hotplug_thread operations.

Your trace does indeed indicate this is not the case, but I cannot, from
the code, see how this could happen.

Could you use trace_printk() instead and boot with
"trace_options=stacktrace" ?

> ALso, the following message is seen twice.
> 
> [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> 
> I don't offer a proposed patch since I have no idea how to best solve the
> problem.
> 
> Also, is the repeated enable/disable/cleanup as part of the normal boot
> really necessary ?

That's weird, I don't see that on my machines. We very much only bring
up the CPUs _once_. Also note they're 7s apart. Did you do something
funny like resume-from-disk or so?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 13:48 ` Peter Zijlstra
@ 2017-10-31 17:16   ` Guenter Roeck
  2017-10-31 18:50     ` Don Zickus
  2017-10-31 21:32   ` Thomas Gleixner
  1 sibling, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2017-10-31 17:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel, Don Zickus, Ingo Molnar

On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> > I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> > Here is the result:
> > 
> > [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> > [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> 
> > [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> > [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> > [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> > [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> > [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > 
> > Looks like there are a number of problems: hardlockup_detector_event_create()
> > creates the event data structure even if it is already created, 
> 
> Right, that does look dodgy. And on its own should be fairly straight
> forward to cure. But I'd like to understand the rest of it first.
> 
> > and hardlockup_detector_perf_cleanup() runs unprotected and in
> > parallel to the enable/create functions.
> 
> Well, looking at the code, cpu_maps_update_begin() aka.
> cpu_add_remove_lock is serializing cpu_up() and cpu_down() and _should_
> thereby also serialize cleanup vs the smp_hotplug_thread operations.
> 
> Your trace does indeed indicate this is not the case, but I cannot, from
> the code, see how this could happen.
> 
> Could you use trace_printk() instead and boot with
> "trace_options=stacktrace" ?
> 
Attached. Let me know if you need more information. Note this is with
msleep(1000) in the cleanup function to avoid the crash.

> > ALso, the following message is seen twice.
> > 
> > [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > 
> > I don't offer a proposed patch since I have no idea how to best solve the
> > problem.
> > 
> > Also, is the repeated enable/disable/cleanup as part of the normal boot
> > really necessary ?
> 
> That's weird, I don't see that on my machines. We very much only bring
> up the CPUs _once_. Also note they're 7s apart. Did you do something
> funny like resume-from-disk or so?

No, just whatever Chrome OS does when it starts the kernel. The hardware
used in this test is a Google Pixelbook, though we have also seen the problem
with other Chromebooks.

Guenter

---
# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
       swapper/0-1     [000] ....     0.350933: hardlockup_detector_perf_init: ############ hardlockup_detector_perf_init
       swapper/0-1     [000] ....     0.350938: <stack trace>
 => kernel_init_freeable
 => kernel_init
 => ret_from_fork
       swapper/0-1     [000] ....     0.350942: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(0)
       swapper/0-1     [000] ....     0.350946: <stack trace>
 => kernel_init_freeable
 => kernel_init
 => ret_from_fork
       swapper/0-1     [000] ....     0.352637: hardlockup_detector_perf_cleanup: ############ hardlockup_detector_perf_cleanup
       swapper/0-1     [000] ....     0.352641: <stack trace>
 => kernel_init_freeable
 => kernel_init
 => ret_from_fork
      watchdog/0-12    [000] ....     0.352649: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(0)
      watchdog/0-12    [000] ....     0.352653: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     0.352655: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(0)
      watchdog/0-12    [000] ....     0.352658: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
       swapper/0-1     [000] ....     1.394555: hardlockup_detector_perf_cleanup: ############ hardlockup_detector_perf_cleanup done
       swapper/0-1     [000] ....     1.394559: <stack trace>
 => kernel_init_freeable
 => kernel_init
 => ret_from_fork
      watchdog/1-15    [001] ....     1.534624: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(1)
      watchdog/1-15    [001] ....     1.534636: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/1-15    [001] ....     1.534640: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(1)
      watchdog/1-15    [001] ....     1.534646: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     1.637496: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(2)
      watchdog/2-21    [002] ....     1.637505: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     1.637507: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(2)
      watchdog/2-21    [002] ....     1.637510: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/3-27    [003] ....     1.742245: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(3)
      watchdog/3-27    [003] ....     1.742253: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/3-27    [003] ....     1.742255: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(3)
      watchdog/3-27    [003] ....     1.742258: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     7.535105: hardlockup_detector_perf_disable: ############ hardlockup_detector_perf_disable(0)
      watchdog/0-12    [000] ....     7.535108: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/1-15    [001] ....     7.535136: hardlockup_detector_perf_disable: ############ hardlockup_detector_perf_disable(1)
      watchdog/1-15    [001] ....     7.535138: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     7.535155: hardlockup_detector_perf_disable: ############ hardlockup_detector_perf_disable(2)
      watchdog/2-21    [002] ....     7.535157: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/3-27    [003] ....     7.535188: hardlockup_detector_perf_disable: ############ hardlockup_detector_perf_disable(3)
      watchdog/3-27    [003] ....     7.535190: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     7.535206: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(2)
      watchdog/2-21    [002] ....     7.535221: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     7.535222: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(2)
      watchdog/2-21    [002] ....     7.535223: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/2-21    [002] ....     7.535224: hardlockup_detector_event_create: ############## perf event for CPU 2 already created, skipping
      watchdog/2-21    [002] ....     7.535225: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     7.535225: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(0)
      watchdog/1-15    [001] ....     7.535225: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(1)
      watchdog/1-15    [001] ....     7.535228: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     7.535228: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     7.535229: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(0)
      watchdog/1-15    [001] ....     7.535229: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(1)
      watchdog/0-12    [000] ....     7.535232: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/1-15    [001] ....     7.535232: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/0-12    [000] ....     7.535233: hardlockup_detector_event_create: ############## perf event for CPU 0 already created, skipping
      watchdog/1-15    [001] ....     7.535233: hardlockup_detector_event_create: ############## perf event for CPU 1 already created, skipping
      watchdog/0-12    [000] ....     7.535236: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/1-15    [001] ....     7.535236: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
          sysctl-148   [000] ....     7.536879: hardlockup_detector_perf_cleanup: ############ hardlockup_detector_perf_cleanup
          sysctl-148   [000] ....     7.536881: <stack trace>
 => proc_watchdog_thresh
 => proc_sys_call_handler
 => proc_sys_write
 => __vfs_write
 => vfs_write
 => SyS_write
 => entry_SYSCALL_64_fastpath
      watchdog/3-27    [003] ....     7.536888: hardlockup_detector_perf_enable: ############ hardlockup_detector_perf_enable(3)
      watchdog/3-27    [003] ....     7.536890: <stack trace>
 => kthread
 => ret_from_fork
      watchdog/3-27    [003] ....     7.536891: hardlockup_detector_event_create: ############ hardlockup_detector_event_create(3)
      watchdog/3-27    [003] ....     7.536892: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
      watchdog/3-27    [003] ....     7.536893: hardlockup_detector_event_create: ############## perf event for CPU 3 already created, skipping
      watchdog/3-27    [003] ....     7.536895: <stack trace>
 => smpboot_thread_fn
 => kthread
 => ret_from_fork
          sysctl-148   [000] ....     8.551925: hardlockup_detector_perf_cleanup: ############ hardlockup_detector_perf_cleanup done
          sysctl-148   [000] ....     8.551928: <stack trace>
 => proc_watchdog_thresh
 => proc_sys_call_handler
 => proc_sys_write
 => __vfs_write
 => vfs_write
 => SyS_write
 => entry_SYSCALL_64_fastpath

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-30 22:45 Crashes in perf_event_ctx_lock_nested Guenter Roeck
  2017-10-31 13:48 ` Peter Zijlstra
@ 2017-10-31 18:48 ` Don Zickus
  1 sibling, 0 replies; 20+ messages in thread
From: Don Zickus @ 2017-10-31 18:48 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Thomas Gleixner, linux-kernel, Ingo Molnar

On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> Hi Thomas,
> 
> we are seeing the following crash in v4.14-rc5/rc7 if CONFIG_HARDLOCKUP_DETECTOR
> is enabled.
> 
> [    5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [    5.915836]
> ==================================================================
> [    5.917325] Unsafe core_pattern used with fs.suid_dumpable=2.
> [    5.917325] Pipe handler or fully qualified core dump path required.
> [    5.917325] Set kernel.core_pattern before fs.suid_dumpable.
> [    5.924046] udevd[147]: starting version 225
> [    5.948520] BUG: KASAN: null-ptr-deref in perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    5.957380] Read of size 8 at addr 0000000000000208 by task watchdog/2/21
> [    5.964973]
> [    5.966646] CPU: 2 PID: 21 Comm: watchdog/2 Not tainted 4.14.0-rc7 #30
> [    5.973947] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
> [    5.982128] Call Trace:
> [    5.984874]  dump_stack+0x4d/0x63
> [    5.988585]  kasan_report+0x24b/0x295
> [    5.992691]  ? watchdog_nmi_enable+0x12/0x12
> [    5.997477]  __asan_load8+0x81/0x83
> [    6.001388]  perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.007437]  perf_event_enable+0xf/0x27
> [    6.011737]  hardlockup_detector_perf_enable+0x3e/0x40
> [    6.017493]  watchdog_nmi_enable+0xe/0x12
> [    6.021990]  watchdog_enable+0x8c/0xc5
> [    6.026195]  smpboot_thread_fn+0x27a/0x3c7
> [    6.030788]  ? sort_range+0x22/0x22
> [    6.034701]  kthread+0x221/0x231
> [    6.038321]  ? kthread_flush_work+0x120/0x120
> [    6.043204]  ret_from_fork+0x22/0x30
> [    6.047207]
> ==================================================================
> ...
> [    6.134561] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
> [    6.143316] IP: perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.149645] PGD 0 P4D 0 
> [    6.152478] Oops: 0000 [#1] PREEMPT SMP KASAN
> [    6.157350] Modules linked in:
> [    6.160766] CPU: 2 PID: 21 Comm: watchdog/2 Tainted: G    B 4.14.0-rc7 #30
> [    6.169422] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
> [    6.177583] task: ffff8803eacd1100 task.stack: ffff8803eacf8000
> [    6.184206] RIP: 0010:perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.191118] RSP: 0018:ffff8803eacffe10 EFLAGS: 00010246
> [    6.196962] RAX: 0000000000000296 RBX: 0000000000000000 RCX: ffffffffa52ee8a5
> [    6.204941] RDX: d8ecf37b519af800 RSI: 0000000000000003 RDI: ffffffffa6274610
> [    6.212911] RBP: ffff8803eacffe30 R08: dffffc0000000000 R09: 0000000000000000
> [    6.220888] R10: ffffed007d59ffa9 R11: ffffc9000044c1a1 R12: 0000000000000000
> [    6.228867] R13: ffff8803eacd1100 R14: 0000000000000208 R15: ffffffffa535ce54
> [    6.231476] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode.  Opts: commit=600
> [    6.237449] EXT4-fs (mmcblk0p8): mounted filesystem with ordered data mode.  Opts: (null)
> [    6.255332] FS:  0000000000000000(0000) GS:ffff8803ed500000(0000) knlGS:0000000000000000
> [    6.264384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.270812] CR2: 0000000000000208 CR3: 0000000430615001 CR4: 00000000003606e0
> [    6.278789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.286761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.294741] Call Trace:
> [    6.297480]  perf_event_enable+0xf/0x27
> [    6.301771]  hardlockup_detector_perf_enable+0x3e/0x40
> [    6.307515]  watchdog_nmi_enable+0xe/0x12 
> [    6.311990]  watchdog_enable+0x8c/0xc5
> [    6.316176]  smpboot_thread_fn+0x27a/0x3c7
> [    6.320757]  ? sort_range+0x22/0x22
> [    6.324650]  kthread+0x221/0x231
> [    6.328251]  ? kthread_flush_work+0x120/0x120
> [    6.333114]  ret_from_fork+0x22/0x30
> [    6.337107] Code: a5 e8 70 58 f6 ff 5b 5d c3 55 48 89 e5 41 56 4c 8d b7 08 02
> 00 00 41 55 41 54 49 89 fc 53 e8 1a e9 f5 ff 4c 89 f7 e8 8d d3 07 00 <49> 8b 9c
> 24 08 02 00 00 31 d2 be 01 00 00 00 48 8d bb b0 00 00
> [    6.358230] RIP: perf_event_ctx_lock_nested.isra.71+0x22/0x89 RSP: ffff8803eacffe10
> [    6.366779] CR2: 0000000000000208
> [    6.370477] ---[ end trace ff68e1917f0a2044 ]---
> [    6.383531] Kernel panic - not syncing: Fatal exception
> [    6.389640] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> The problem is a heisenbug - slight changes in the code, such as added logging,
> can make it disappear.
> 
> I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> Here is the result:
> 
> [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> 
> Looks like there are a number of problems: hardlockup_detector_event_create()
> creates the event data structure even if it is already created, and
> hardlockup_detector_perf_cleanup() runs unprotected and in parallel to
> the enable/create functions.

Hi,

Thomas created a deferred cleanup mechanism to help with hotplugging, so yes
an event tries to be re-created if the deferred cleanup wasn't run yet.  There
is probably a bug there to remove the re-created event off of the deferred
'dead_events_mask'.  And maybe some locking there too, though I don't know
if adding locking around that queue re-introduces the deadlock to the
hotplug code.  See commit 941154bd6937a710ae9193a3c733c0029e5ae7b8 for
details.


> 
> ALso, the following message is seen twice.
> 
> [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> 
> I don't offer a proposed patch since I have no idea how to best solve the
> problem.
> 
> Also, is the repeated enable/disable/cleanup as part of the normal boot
> really necessary ?

Yes, part of it.  The first event_create(0) is a hardware check.  Does the
hardware support perf counters.  The event is removed after the check. If
the check passes, allow all cpus to create the event.  Hence the second
event_create(0).  Otherwise, disable the counters from being enabled on
other cpus.

What doesn't make sense is the perf_disable(0-3) at the 7 second mark.  Not
sure why that happens.  But it seems to cause the problem by exposing a race
of disabling the event and then re-creating it.

What is odd, is the panic.

	hardlockup_detector_perf_enable ->
		hardlockup_detector_event_create
		perf_event_enable ->
			panic()

due to the delay of hardlockup_detector_perf_cleanup(), I would have assumed
hardlockup_detector_event_create() would have warned and returned
had the cleanup not been invoked yet.

The only race I can see here:

CPU A						CPU B
						hardlockup_detector_perf_cleanup
hardlockup_detector_perf_enable				 perf_event_release_kernel
	hardlockup_detector_event_create
							per_cpu(watchdog_ev, cpu) = NULL;
	perf_event_enable(this_cpu_read(watchdog_ev)
					^^^NULL


I am guessing adding a check in hardlockup_detector_perf_enable() would
prove that, but as you said, it might may the problem go into hiding.


The below patch probably doesn't solve the problem, just makes the race
condition a lot smaller?  Though I don't know if this causes more problems
internally to perf with two attached events temporarily.

Cheers,
Don


diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 71a62ceacdc8..0b9bd1e0bf57 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -221,12 +221,16 @@ void hardlockup_detector_perf_cleanup(void)
 		struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
 		/*
+ 		 * Clear immediately to avoid delay of releasing event.
+ 		 */
+		per_cpu(watchdog_ev, cpu) = NULL;
+
+		/*
 		 * Required because for_each_cpu() reports  unconditionally
 		 * CPU0 as set on UP kernels. Sigh.
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 17:16   ` Guenter Roeck
@ 2017-10-31 18:50     ` Don Zickus
  2017-10-31 20:12       ` Guenter Roeck
  0 siblings, 1 reply; 20+ messages in thread
From: Don Zickus @ 2017-10-31 18:50 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Ingo Molnar

On Tue, Oct 31, 2017 at 10:16:22AM -0700, Guenter Roeck wrote:
> On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> > > I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> > > Here is the result:
> > > 
> > > [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> > > [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > > [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > > [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > > [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > > [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > > [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > > [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > > [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > > [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > 
> > > [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> > > [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> > > [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> > > [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> > > [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > > [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > > [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > > [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > > [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > > [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > > [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > > [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > > [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > > 
> > > Looks like there are a number of problems: hardlockup_detector_event_create()
> > > creates the event data structure even if it is already created, 
> > 
> > Right, that does look dodgy. And on its own should be fairly straight
> > forward to cure. But I'd like to understand the rest of it first.
> > 
> > > and hardlockup_detector_perf_cleanup() runs unprotected and in
> > > parallel to the enable/create functions.
> > 
> > Well, looking at the code, cpu_maps_update_begin() aka.
> > cpu_add_remove_lock is serializing cpu_up() and cpu_down() and _should_
> > thereby also serialize cleanup vs the smp_hotplug_thread operations.
> > 
> > Your trace does indeed indicate this is not the case, but I cannot, from
> > the code, see how this could happen.
> > 
> > Could you use trace_printk() instead and boot with
> > "trace_options=stacktrace" ?
> > 
> Attached. Let me know if you need more information. Note this is with
> msleep(1000) in the cleanup function to avoid the crash.
> 
> > > ALso, the following message is seen twice.
> > > 
> > > [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > > [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > > 
> > > I don't offer a proposed patch since I have no idea how to best solve the
> > > problem.
> > > 
> > > Also, is the repeated enable/disable/cleanup as part of the normal boot
> > > really necessary ?
> > 
> > That's weird, I don't see that on my machines. We very much only bring
> > up the CPUs _once_. Also note they're 7s apart. Did you do something
> > funny like resume-from-disk or so?
> 
> No, just whatever Chrome OS does when it starts the kernel. The hardware
> used in this test is a Google Pixelbook, though we have also seen the problem
> with other Chromebooks.

Is Chrome OS, changing the default timeout from 10s to something else?
That would explain it as a script is executed late in the boot cycle and
explain the quick restart.

Cheers,
Don

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 18:50     ` Don Zickus
@ 2017-10-31 20:12       ` Guenter Roeck
  2017-10-31 20:23         ` Don Zickus
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2017-10-31 20:12 UTC (permalink / raw)
  To: Don Zickus; +Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Ingo Molnar

On Tue, Oct 31, 2017 at 02:50:59PM -0400, Don Zickus wrote:
> On Tue, Oct 31, 2017 at 10:16:22AM -0700, Guenter Roeck wrote:
> > On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote:
> > > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> > > > I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> > > > Here is the result:
> > > > 
> > > > [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> > > > [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > > [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > > > [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > > > [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > > [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > > > [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > > > [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > > > [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > > > [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > > > [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > > > [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > > 
> > > > [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> > > > [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> > > > [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> > > > [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> > > > [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > > > [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > > > [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > > > [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > > > [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > > > [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > > > [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > > > [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > > > [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > > > [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > > > 
> > > > Looks like there are a number of problems: hardlockup_detector_event_create()
> > > > creates the event data structure even if it is already created, 
> > > 
> > > Right, that does look dodgy. And on its own should be fairly straight
> > > forward to cure. But I'd like to understand the rest of it first.
> > > 
> > > > and hardlockup_detector_perf_cleanup() runs unprotected and in
> > > > parallel to the enable/create functions.
> > > 
> > > Well, looking at the code, cpu_maps_update_begin() aka.
> > > cpu_add_remove_lock is serializing cpu_up() and cpu_down() and _should_
> > > thereby also serialize cleanup vs the smp_hotplug_thread operations.
> > > 
> > > Your trace does indeed indicate this is not the case, but I cannot, from
> > > the code, see how this could happen.
> > > 
> > > Could you use trace_printk() instead and boot with
> > > "trace_options=stacktrace" ?
> > > 
> > Attached. Let me know if you need more information. Note this is with
> > msleep(1000) in the cleanup function to avoid the crash.
> > 
> > > > ALso, the following message is seen twice.
> > > > 
> > > > [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > > > [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> > > > 
> > > > I don't offer a proposed patch since I have no idea how to best solve the
> > > > problem.
> > > > 
> > > > Also, is the repeated enable/disable/cleanup as part of the normal boot
> > > > really necessary ?
> > > 
> > > That's weird, I don't see that on my machines. We very much only bring
> > > up the CPUs _once_. Also note they're 7s apart. Did you do something
> > > funny like resume-from-disk or so?
> > 
> > No, just whatever Chrome OS does when it starts the kernel. The hardware
> > used in this test is a Google Pixelbook, though we have also seen the problem
> > with other Chromebooks.
> 
> Is Chrome OS, changing the default timeout from 10s to something else?
> That would explain it as a script is executed late in the boot cycle and
> explain the quick restart.
> 

Correct, Chrome OS changes the timeout from 10 to 5 seconds.

A little experiment suggests that the problem can be triggered by updating
/proc/sys/kernel/watchdog_thresh. hardlockup_detector_perf_enable() is
called while hardlockup_detector_perf_cleanup() is running.

Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 20:12       ` Guenter Roeck
@ 2017-10-31 20:23         ` Don Zickus
  0 siblings, 0 replies; 20+ messages in thread
From: Don Zickus @ 2017-10-31 20:23 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Ingo Molnar

> > Is Chrome OS, changing the default timeout from 10s to something else?
> > That would explain it as a script is executed late in the boot cycle and
> > explain the quick restart.
> > 
> 
> Correct, Chrome OS changes the timeout from 10 to 5 seconds.
> 
> A little experiment suggests that the problem can be triggered by updating
> /proc/sys/kernel/watchdog_thresh. hardlockup_detector_perf_enable() is
> called while hardlockup_detector_perf_cleanup() is running.

Ok, that makes sense then.  Though I thought I tested that before acking it.
I will try to duplicate that on my end and see if various solutions could
work.

Cheers,
Don

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 13:48 ` Peter Zijlstra
  2017-10-31 17:16   ` Guenter Roeck
@ 2017-10-31 21:32   ` Thomas Gleixner
  2017-10-31 22:11     ` Guenter Roeck
                       ` (3 more replies)
  1 sibling, 4 replies; 20+ messages in thread
From: Thomas Gleixner @ 2017-10-31 21:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Guenter Roeck, linux-kernel, Don Zickus, Ingo Molnar

On Tue, 31 Oct 2017, Peter Zijlstra wrote:
> On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> > I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> > Here is the result:
> > 
> > [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> > [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> 
> > [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> > [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> > [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> > [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> > [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> > [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> > [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> > [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> > [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> > [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> > [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> > [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> > [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> > [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> > 
> > Looks like there are a number of problems: hardlockup_detector_event_create()
> > creates the event data structure even if it is already created, 
> 
> Right, that does look dodgy. And on its own should be fairly straight
> forward to cure. But I'd like to understand the rest of it first.
>
> > and hardlockup_detector_perf_cleanup() runs unprotected and in
> > parallel to the enable/create functions.

We have the serialization via watchdog_mutex(). That is held around all
invocations of lockup_detector_reconfigure().

So we have the following scenarios:

boot and the proc interface:

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
        cpus_read_lock();
	stop()
	  park threads();
	update();
	start()
	  unpark threads();
	cpus_read_unlock();
	cleanup();
    unlock(watchdog_mutex);
	
cpu hotplug:

    cpu_maps_update_begin()
    cpus_write_lock()
    cpu_up()
	thread_unpark()
	    start()
    cpus_write_unlock()
    cpu_maps_update_end()

cpu hotunplug:

    cpu_maps_update_begin()
    cpus_write_lock()
    cpu_down()
	thread_park()
	    stop()
    cpus_write_unlock()
    lock(watchdog_mutex);
    cleanup();
    unlock(watchdog_mutex);
    cpu_maps_update_end()

But there is a problem:

The unpark of the smpboot thread is not synchronous. It's just unparked so
there is no guarantee when it runs.

If it starts running _before_ the cleanup happened then it will create a
event and overwrite the dead event pointer. The new event is then cleaned
up because the event is marked dead. The park side is safe as that actually
waits for the thread to reach parked state.

That means we can have the following situation:

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
        cpus_read_lock();
	stop();
	   park()
	update();
	start();
	   unpark()
	cpus_read_unlock();		thread runs()
	cleanup();						
    unlock(watchdog_mutex);

Duh. /me feels outright stupid.

So we have to revert

a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")

Patch attached.

Thanks,

	tglx

8<------------------
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -21,6 +21,7 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(vo
 
 	if (event) {
 		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
 		watchdog_cpus--;
 	}
@@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(vo
 	int cpu;
 
 	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+		struct perf_event *event = per_cpu(dead_event, cpu);
 
 		/*
 		 * Required because for_each_cpu() reports  unconditionally
@@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(vo
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
+		per_cpu(dead_event_ev, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 21:32   ` Thomas Gleixner
@ 2017-10-31 22:11     ` Guenter Roeck
  2017-11-01 18:11       ` Don Zickus
  2017-11-01 18:22       ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner
  2017-11-01  8:14     ` Peter Zijlstra
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Guenter Roeck @ 2017-10-31 22:11 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Peter Zijlstra, linux-kernel, Don Zickus, Ingo Molnar

On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:

[ ...] 

> So we have to revert
> 
> a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> 
> Patch attached.
> 

Tested-by: Guenter Roeck <linux@roeck-us.net>

There is still a problem. When running

echo 6 > /proc/sys/kernel/watchdog_thresh
echo 5 > /proc/sys/kernel/watchdog_thresh

repeatedly, the message

NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.

stops after a while (after ~10-30 iterations, with fluctuations).
After adding trace messages into hardlockup_detector_perf_disable()
and hardlockup_detector_perf_enable(), I see:

hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3
hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2
hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1
hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0
...
hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2
hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1
hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0
hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1
...
hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6
hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5
hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4
hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3

Maybe watchdog_cpus needs to be atomic ?

Guenter

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 21:32   ` Thomas Gleixner
  2017-10-31 22:11     ` Guenter Roeck
@ 2017-11-01  8:14     ` Peter Zijlstra
  2017-11-01  8:26       ` Thomas Gleixner
  2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
  2017-11-01 20:27     ` tip-bot for Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-11-01  8:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Guenter Roeck, linux-kernel, Don Zickus, Ingo Molnar

On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:
> That means we can have the following situation:
> 
>     lock(watchdog_mutex);
>     lockup_detector_reconfigure();
>         cpus_read_lock();
> 	stop();
> 	   park()
> 	update();
> 	start();
> 	   unpark()
> 	cpus_read_unlock();		thread runs()
> 	cleanup();						
>     unlock(watchdog_mutex);
> 

Isn't there also a where hardlockup_detector_perf_init() creates an
event to 'probe' stuff, and then hardlockup_detector_perf_enable()
_again_ creates the event?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-11-01  8:14     ` Peter Zijlstra
@ 2017-11-01  8:26       ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2017-11-01  8:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Guenter Roeck, linux-kernel, Don Zickus, Ingo Molnar


On Wed, 1 Nov 2017, Peter Zijlstra wrote:

> On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:
> > That means we can have the following situation:
> > 
> >     lock(watchdog_mutex);
> >     lockup_detector_reconfigure();
> >         cpus_read_lock();
> > 	stop();
> > 	   park()
> > 	update();
> > 	start();
> > 	   unpark()
> > 	cpus_read_unlock();		thread runs()
> > 	cleanup();						
> >     unlock(watchdog_mutex);
> > 
> 
> Isn't there also a where hardlockup_detector_perf_init() creates an
> event to 'probe' stuff, and then hardlockup_detector_perf_enable()
> _again_ creates the event?

probe() releases the event.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 22:11     ` Guenter Roeck
@ 2017-11-01 18:11       ` Don Zickus
  2017-11-01 18:34         ` Guenter Roeck
                           ` (2 more replies)
  2017-11-01 18:22       ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner
  1 sibling, 3 replies; 20+ messages in thread
From: Don Zickus @ 2017-11-01 18:11 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, Ingo Molnar

On Tue, Oct 31, 2017 at 03:11:07PM -0700, Guenter Roeck wrote:
> On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:
> 
> [ ...] 
> 
> > So we have to revert
> > 
> > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> > 
> > Patch attached.
> > 
> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> 
> There is still a problem. When running
> 
> echo 6 > /proc/sys/kernel/watchdog_thresh
> echo 5 > /proc/sys/kernel/watchdog_thresh
> 
> repeatedly, the message
> 
> NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> 
> stops after a while (after ~10-30 iterations, with fluctuations).
> After adding trace messages into hardlockup_detector_perf_disable()
> and hardlockup_detector_perf_enable(), I see:
> 
> hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3
> hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2
> hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1
> hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0
> ...
> hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2
> hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1
> hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0
> hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1
> ...
> hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6
> hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5
> hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4
> hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3
> 
> Maybe watchdog_cpus needs to be atomic ?

I switched it to atomic and it solves that problem.  The functionality isn't
broken currently, just the informational message.

Patch attached to try.

Cheers,
Don

---8<--------------

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index a7f137c1933a..8ee4da223b35 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -14,6 +14,7 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
+#include <linux/atomic.h>
 
 #include <asm/irq_regs.h>
 #include <linux/perf_event.h>
@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static unsigned int watchdog_cpus;
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
 	if (hardlockup_detector_event_create())
 		return;
 
-	if (!watchdog_cpus++)
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
 	perf_event_enable(this_cpu_read(watchdog_ev));
@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
 		this_cpu_write(watchdog_ev, NULL);
 		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		watchdog_cpus--;
+		atomic_dec(&watchdog_cpus);
 	}
 }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-10-31 22:11     ` Guenter Roeck
  2017-11-01 18:11       ` Don Zickus
@ 2017-11-01 18:22       ` Thomas Gleixner
  1 sibling, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2017-11-01 18:22 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Peter Zijlstra, linux-kernel, Don Zickus, Ingo Molnar

On Tue, 31 Oct 2017, Guenter Roeck wrote:
> On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote:
> 
> [ ...] 
> 
> > So we have to revert
> > 
> > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> > 
> > Patch attached.
> > 
> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> 
> There is still a problem. When running
> 
> echo 6 > /proc/sys/kernel/watchdog_thresh
> echo 5 > /proc/sys/kernel/watchdog_thresh
> 
> repeatedly, the message
> 
> NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> 
> stops after a while (after ~10-30 iterations, with fluctuations).
> After adding trace messages into hardlockup_detector_perf_disable()
> and hardlockup_detector_perf_enable(), I see:
> 
> hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3
> hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2
> hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1
> hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0
> ...
> hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2
> hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1
> hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0
> hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1
> ...
> hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6
> hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5
> hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4
> hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3
> 
> Maybe watchdog_cpus needs to be atomic ?

Indeed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Crashes in perf_event_ctx_lock_nested
  2017-11-01 18:11       ` Don Zickus
@ 2017-11-01 18:34         ` Guenter Roeck
  2017-11-01 19:46         ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus
  2017-11-01 20:28         ` tip-bot for Don Zickus
  2 siblings, 0 replies; 20+ messages in thread
From: Guenter Roeck @ 2017-11-01 18:34 UTC (permalink / raw)
  To: Don Zickus; +Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, Ingo Molnar

On Wed, Nov 01, 2017 at 02:11:27PM -0400, Don Zickus wrote:
> > 
> > Maybe watchdog_cpus needs to be atomic ?
> 
> I switched it to atomic and it solves that problem.  The functionality isn't
> broken currently, just the informational message.
> 
> Patch attached to try.
> 

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks!
Guenter

> Cheers,
> Don
> 
> ---8<--------------
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index a7f137c1933a..8ee4da223b35 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -14,6 +14,7 @@
>  #include <linux/nmi.h>
>  #include <linux/module.h>
>  #include <linux/sched/debug.h>
> +#include <linux/atomic.h>

Alphabetical order, maybe ?

>  
>  #include <asm/irq_regs.h>
>  #include <linux/perf_event.h>
> @@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
>  static struct cpumask dead_events_mask;
>  
>  static unsigned long hardlockup_allcpu_dumped;
> -static unsigned int watchdog_cpus;
> +static atomic_t watchdog_cpus = ATOMIC_INIT(0);
>  
>  void arch_touch_nmi_watchdog(void)
>  {
> @@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
>  	if (hardlockup_detector_event_create())
>  		return;
>  
> -	if (!watchdog_cpus++)
> +	/* use original value for check */
> +	if (!atomic_fetch_inc(&watchdog_cpus))
>  		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
>  
>  	perf_event_enable(this_cpu_read(watchdog_ev));
> @@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
>  		this_cpu_write(watchdog_ev, NULL);
>  		this_cpu_write(dead_event, event);
>  		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
> -		watchdog_cpus--;
> +		atomic_dec(&watchdog_cpus);
>  	}
>  }
>  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
  2017-10-31 21:32   ` Thomas Gleixner
  2017-10-31 22:11     ` Guenter Roeck
  2017-11-01  8:14     ` Peter Zijlstra
@ 2017-11-01 19:46     ` tip-bot for Thomas Gleixner
  2017-11-01 20:32       ` Guenter Roeck
  2017-11-01 20:27     ` tip-bot for Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-11-01 19:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, peterz, mingo, hpa, dzickus, linux, tglx

Commit-ID:  1c294733b7b9f712f78d15cfa75ffdea72b79abb
Gitweb:     https://git.kernel.org/tip/1c294733b7b9f712f78d15cfa75ffdea72b79abb
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 1 Nov 2017 20:41:27 +0100

watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")

Guenter reported a crash in the watchdog/perf code, which is caused by
cleanup() and enable() running concurrently. The reason for this is:

The watchdog functions are serialized via the watchdog_mutex and cpu
hotplug locking, but the enable of the perf based watchdog happens in
context of the unpark callback of the smpboot thread. But that unpark
function is not synchronous inside the locking. The unparking of the thread
just wakes it up and leaves so there is no guarantee when the thread is
executing.

If it starts running _before_ the cleanup happened then it will create a
event and overwrite the dead event pointer. The new event is then cleaned
up because the event is marked dead.

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
        cpus_read_lock();
	stop();
	   park()
	update();
	start();
	   unpark()
	cpus_read_unlock();		thread runs()
					  overwrite dead event ptr
	cleanup();
	  free new event, which is active inside perf....
    unlock(watchdog_mutex);

The park side is safe as that actually waits for the thread to reach
parked state.

Commit a33d44843d45 removed the protection against this kind of scenario
under the stupid assumption that the hotplug serialization and the
watchdog_mutex cover everything. 

Bring it back.

Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos

---
 kernel/watchdog_hld.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 71a62ce..f8db56b 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -21,6 +21,7 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
 
 	if (event) {
 		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
 		watchdog_cpus--;
 	}
@@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
 	int cpu;
 
 	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+		struct perf_event *event = per_cpu(dead_event, cpu);
 
 		/*
 		 * Required because for_each_cpu() reports  unconditionally
@@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
+		per_cpu(dead_event_ev, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter
  2017-11-01 18:11       ` Don Zickus
  2017-11-01 18:34         ` Guenter Roeck
@ 2017-11-01 19:46         ` tip-bot for Don Zickus
  2017-11-01 20:28         ` tip-bot for Don Zickus
  2 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Don Zickus @ 2017-11-01 19:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux, hpa, dzickus, tglx, mingo, linux-kernel, peterz

Commit-ID:  c7254c8aabe3025770fdb6f2d84aded11716ca2b
Gitweb:     https://git.kernel.org/tip/c7254c8aabe3025770fdb6f2d84aded11716ca2b
Author:     Don Zickus <dzickus@redhat.com>
AuthorDate: Wed, 1 Nov 2017 14:11:27 -0400
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 1 Nov 2017 20:41:28 +0100

watchdog/hardlockup/perf: Use atomics to track in-use cpu counter

Guenter reported:
  There is still a problem. When running 
    echo 6 > /proc/sys/kernel/watchdog_thresh
    echo 5 > /proc/sys/kernel/watchdog_thresh
  repeatedly, the message
 
   NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
 
  stops after a while (after ~10-30 iterations, with fluctuations).
  Maybe watchdog_cpus needs to be atomic ?

That's correct as this again is affected by the asynchronous nature of the
smpboot thread unpark mechanism.

CPU 0				CPU1			CPU2
write(watchdog_thresh, 6)	
  stop()
    park()
  update()
  start()
    unpark()
				thread->unpark()
				  cnt++;
write(watchdog_thresh, 5)				thread->unpark()
  stop()
    park()			thread->park()
				   cnt--;		  cnt++;
  update()
  start()
    unpark()

That's not a functional problem, it just affects the informational message.

Convert watchdog_cpus to atomic_t to prevent the problem

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com

---
 kernel/watchdog_hld.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index f8db56b..52218f2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -12,6 +12,7 @@
 #define pr_fmt(fmt) "NMI watchdog: " fmt
 
 #include <linux/nmi.h>
+#include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
 
@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static unsigned int watchdog_cpus;
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
 	if (hardlockup_detector_event_create())
 		return;
 
-	if (!watchdog_cpus++)
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
 	perf_event_enable(this_cpu_read(watchdog_ev));
@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
 		this_cpu_write(watchdog_ev, NULL);
 		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		watchdog_cpus--;
+		atomic_dec(&watchdog_cpus);
 	}
 }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
  2017-10-31 21:32   ` Thomas Gleixner
                       ` (2 preceding siblings ...)
  2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
@ 2017-11-01 20:27     ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-11-01 20:27 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, tglx, linux, dzickus, linux-kernel, mingo, peterz

Commit-ID:  9c388a5ed1960b2ebbebd3dbe7553092b0c15ec1
Gitweb:     https://git.kernel.org/tip/9c388a5ed1960b2ebbebd3dbe7553092b0c15ec1
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 1 Nov 2017 21:18:39 +0100

watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")

Guenter reported a crash in the watchdog/perf code, which is caused by
cleanup() and enable() running concurrently. The reason for this is:

The watchdog functions are serialized via the watchdog_mutex and cpu
hotplug locking, but the enable of the perf based watchdog happens in
context of the unpark callback of the smpboot thread. But that unpark
function is not synchronous inside the locking. The unparking of the thread
just wakes it up and leaves so there is no guarantee when the thread is
executing.

If it starts running _before_ the cleanup happened then it will create a
event and overwrite the dead event pointer. The new event is then cleaned
up because the event is marked dead.

    lock(watchdog_mutex);
    lockup_detector_reconfigure();
        cpus_read_lock();
	stop();
	   park()
	update();
	start();
	   unpark()
	cpus_read_unlock();		thread runs()
					  overwrite dead event ptr
	cleanup();
	  free new event, which is active inside perf....
    unlock(watchdog_mutex);

The park side is safe as that actually waits for the thread to reach
parked state.

Commit a33d44843d45 removed the protection against this kind of scenario
under the stupid assumption that the hotplug serialization and the
watchdog_mutex cover everything. 

Bring it back.

Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos


---
 kernel/watchdog_hld.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 71a62ce..a7f137c 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -21,6 +21,7 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
@@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
 
 	if (event) {
 		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
 		watchdog_cpus--;
 	}
@@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
 	int cpu;
 
 	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+		struct perf_event *event = per_cpu(dead_event, cpu);
 
 		/*
 		 * Required because for_each_cpu() reports  unconditionally
@@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
+		per_cpu(dead_event, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter
  2017-11-01 18:11       ` Don Zickus
  2017-11-01 18:34         ` Guenter Roeck
  2017-11-01 19:46         ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus
@ 2017-11-01 20:28         ` tip-bot for Don Zickus
  2 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Don Zickus @ 2017-11-01 20:28 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, linux-kernel, dzickus, linux, mingo, tglx, peterz

Commit-ID:  42f930da7f00c0ab23df4c7aed36137f35988980
Gitweb:     https://git.kernel.org/tip/42f930da7f00c0ab23df4c7aed36137f35988980
Author:     Don Zickus <dzickus@redhat.com>
AuthorDate: Wed, 1 Nov 2017 14:11:27 -0400
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 1 Nov 2017 21:18:40 +0100

watchdog/hardlockup/perf: Use atomics to track in-use cpu counter

Guenter reported:
  There is still a problem. When running 
    echo 6 > /proc/sys/kernel/watchdog_thresh
    echo 5 > /proc/sys/kernel/watchdog_thresh
  repeatedly, the message
 
   NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
 
  stops after a while (after ~10-30 iterations, with fluctuations).
  Maybe watchdog_cpus needs to be atomic ?

That's correct as this again is affected by the asynchronous nature of the
smpboot thread unpark mechanism.

CPU 0				CPU1			CPU2
write(watchdog_thresh, 6)	
  stop()
    park()
  update()
  start()
    unpark()
				thread->unpark()
				  cnt++;
write(watchdog_thresh, 5)				thread->unpark()
  stop()
    park()			thread->park()
				   cnt--;		  cnt++;
  update()
  start()
    unpark()

That's not a functional problem, it just affects the informational message.

Convert watchdog_cpus to atomic_t to prevent the problem

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20171101181126.j727fqjmdthjz4xk@redhat.com


---
 kernel/watchdog_hld.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index a7f137c..a84b205 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -12,6 +12,7 @@
 #define pr_fmt(fmt) "NMI watchdog: " fmt
 
 #include <linux/nmi.h>
+#include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
 
@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static unsigned int watchdog_cpus;
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
 	if (hardlockup_detector_event_create())
 		return;
 
-	if (!watchdog_cpus++)
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
 	perf_event_enable(this_cpu_read(watchdog_ev));
@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
 		this_cpu_write(watchdog_ev, NULL);
 		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		watchdog_cpus--;
+		atomic_dec(&watchdog_cpus);
 	}
 }
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
  2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
@ 2017-11-01 20:32       ` Guenter Roeck
  2017-11-01 20:52         ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Guenter Roeck @ 2017-11-01 20:32 UTC (permalink / raw)
  To: mingo, linux-kernel, peterz, tglx, dzickus, hpa; +Cc: linux-tip-commits

On Wed, Nov 01, 2017 at 12:46:00PM -0700, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Gitweb:     https://git.kernel.org/tip/1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Wed, 1 Nov 2017 20:41:27 +0100
> 
> watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> 
> Guenter reported a crash in the watchdog/perf code, which is caused by
> cleanup() and enable() running concurrently. The reason for this is:
> 
> The watchdog functions are serialized via the watchdog_mutex and cpu
> hotplug locking, but the enable of the perf based watchdog happens in
> context of the unpark callback of the smpboot thread. But that unpark
> function is not synchronous inside the locking. The unparking of the thread
> just wakes it up and leaves so there is no guarantee when the thread is
> executing.
> 
> If it starts running _before_ the cleanup happened then it will create a
> event and overwrite the dead event pointer. The new event is then cleaned
> up because the event is marked dead.
> 
>     lock(watchdog_mutex);
>     lockup_detector_reconfigure();
>         cpus_read_lock();
> 	stop();
> 	   park()
> 	update();
> 	start();
> 	   unpark()
> 	cpus_read_unlock();		thread runs()
> 					  overwrite dead event ptr
> 	cleanup();
> 	  free new event, which is active inside perf....
>     unlock(watchdog_mutex);
> 
> The park side is safe as that actually waits for the thread to reach
> parked state.
> 
> Commit a33d44843d45 removed the protection against this kind of scenario
> under the stupid assumption that the hotplug serialization and the
> watchdog_mutex cover everything. 
> 
> Bring it back.
> 
> Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
> Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Thomas Feels-stupid Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Don Zickus <dzickus@redhat.com>
> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
> 
> ---
>  kernel/watchdog_hld.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 71a62ce..f8db56b 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -21,6 +21,7 @@
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct perf_event *, dead_event);
>  static struct cpumask dead_events_mask;
>  
>  static unsigned long hardlockup_allcpu_dumped;
> @@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
>  
>  	if (event) {
>  		perf_event_disable(event);
> +		this_cpu_write(watchdog_ev, NULL);
> +		this_cpu_write(dead_event, event);
>  		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
>  		watchdog_cpus--;
>  	}
> @@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
>  	int cpu;
>  
>  	for_each_cpu(cpu, &dead_events_mask) {
> -		struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +		struct perf_event *event = per_cpu(dead_event, cpu);
>  
>  		/*
>  		 * Required because for_each_cpu() reports  unconditionally
> @@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
>  		 */
>  		if (event)
>  			perf_event_release_kernel(event);
> -		per_cpu(watchdog_ev, cpu) = NULL;
> +		per_cpu(dead_event_ev, cpu) = NULL;

Uuhh ... there is an extra _ev which somehow slipped in and doesn't
compile.

Guenter

>  	}
>  	cpumask_clear(&dead_events_mask);
>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy")
  2017-11-01 20:32       ` Guenter Roeck
@ 2017-11-01 20:52         ` Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Gleixner @ 2017-11-01 20:52 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: mingo, linux-kernel, peterz, dzickus, hpa, linux-tip-commits

On Wed, 1 Nov 2017, Guenter Roeck wrote:
> > +		per_cpu(dead_event_ev, cpu) = NULL;
> 
> Uuhh ... there is an extra _ev which somehow slipped in and doesn't
> compile.

zero bot noticed already and yelled at me. I have no clue how I managed to
fat finger that after compiling it ....

Fixed and repushed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 22:45 Crashes in perf_event_ctx_lock_nested Guenter Roeck
2017-10-31 13:48 ` Peter Zijlstra
2017-10-31 17:16   ` Guenter Roeck
2017-10-31 18:50     ` Don Zickus
2017-10-31 20:12       ` Guenter Roeck
2017-10-31 20:23         ` Don Zickus
2017-10-31 21:32   ` Thomas Gleixner
2017-10-31 22:11     ` Guenter Roeck
2017-11-01 18:11       ` Don Zickus
2017-11-01 18:34         ` Guenter Roeck
2017-11-01 19:46         ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus
2017-11-01 20:28         ` tip-bot for Don Zickus
2017-11-01 18:22       ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner
2017-11-01  8:14     ` Peter Zijlstra
2017-11-01  8:26       ` Thomas Gleixner
2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
2017-11-01 20:32       ` Guenter Roeck
2017-11-01 20:52         ` Thomas Gleixner
2017-11-01 20:27     ` tip-bot for Thomas Gleixner
2017-10-31 18:48 ` Crashes in perf_event_ctx_lock_nested Don Zickus

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git