All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>
Subject: Re: Crashes in perf_event_ctx_lock_nested
Date: Tue, 31 Oct 2017 14:48:53 -0400	[thread overview]
Message-ID: <20171031184853.y4bliffza6hdhjkk@redhat.com> (raw)
In-Reply-To: <20171030224512.GA13592@roeck-us.net>

On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote:
> Hi Thomas,
> 
> we are seeing the following crash in v4.14-rc5/rc7 if CONFIG_HARDLOCKUP_DETECTOR
> is enabled.
> 
> [    5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [    5.915836]
> ==================================================================
> [    5.917325] Unsafe core_pattern used with fs.suid_dumpable=2.
> [    5.917325] Pipe handler or fully qualified core dump path required.
> [    5.917325] Set kernel.core_pattern before fs.suid_dumpable.
> [    5.924046] udevd[147]: starting version 225
> [    5.948520] BUG: KASAN: null-ptr-deref in perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    5.957380] Read of size 8 at addr 0000000000000208 by task watchdog/2/21
> [    5.964973]
> [    5.966646] CPU: 2 PID: 21 Comm: watchdog/2 Not tainted 4.14.0-rc7 #30
> [    5.973947] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
> [    5.982128] Call Trace:
> [    5.984874]  dump_stack+0x4d/0x63
> [    5.988585]  kasan_report+0x24b/0x295
> [    5.992691]  ? watchdog_nmi_enable+0x12/0x12
> [    5.997477]  __asan_load8+0x81/0x83
> [    6.001388]  perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.007437]  perf_event_enable+0xf/0x27
> [    6.011737]  hardlockup_detector_perf_enable+0x3e/0x40
> [    6.017493]  watchdog_nmi_enable+0xe/0x12
> [    6.021990]  watchdog_enable+0x8c/0xc5
> [    6.026195]  smpboot_thread_fn+0x27a/0x3c7
> [    6.030788]  ? sort_range+0x22/0x22
> [    6.034701]  kthread+0x221/0x231
> [    6.038321]  ? kthread_flush_work+0x120/0x120
> [    6.043204]  ret_from_fork+0x22/0x30
> [    6.047207]
> ==================================================================
> ...
> [    6.134561] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208
> [    6.143316] IP: perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.149645] PGD 0 P4D 0 
> [    6.152478] Oops: 0000 [#1] PREEMPT SMP KASAN
> [    6.157350] Modules linked in:
> [    6.160766] CPU: 2 PID: 21 Comm: watchdog/2 Tainted: G    B 4.14.0-rc7 #30
> [    6.169422] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017
> [    6.177583] task: ffff8803eacd1100 task.stack: ffff8803eacf8000
> [    6.184206] RIP: 0010:perf_event_ctx_lock_nested.isra.71+0x22/0x89
> [    6.191118] RSP: 0018:ffff8803eacffe10 EFLAGS: 00010246
> [    6.196962] RAX: 0000000000000296 RBX: 0000000000000000 RCX: ffffffffa52ee8a5
> [    6.204941] RDX: d8ecf37b519af800 RSI: 0000000000000003 RDI: ffffffffa6274610
> [    6.212911] RBP: ffff8803eacffe30 R08: dffffc0000000000 R09: 0000000000000000
> [    6.220888] R10: ffffed007d59ffa9 R11: ffffc9000044c1a1 R12: 0000000000000000
> [    6.228867] R13: ffff8803eacd1100 R14: 0000000000000208 R15: ffffffffa535ce54
> [    6.231476] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode.  Opts: commit=600
> [    6.237449] EXT4-fs (mmcblk0p8): mounted filesystem with ordered data mode.  Opts: (null)
> [    6.255332] FS:  0000000000000000(0000) GS:ffff8803ed500000(0000) knlGS:0000000000000000
> [    6.264384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.270812] CR2: 0000000000000208 CR3: 0000000430615001 CR4: 00000000003606e0
> [    6.278789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.286761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.294741] Call Trace:
> [    6.297480]  perf_event_enable+0xf/0x27
> [    6.301771]  hardlockup_detector_perf_enable+0x3e/0x40
> [    6.307515]  watchdog_nmi_enable+0xe/0x12 
> [    6.311990]  watchdog_enable+0x8c/0xc5
> [    6.316176]  smpboot_thread_fn+0x27a/0x3c7
> [    6.320757]  ? sort_range+0x22/0x22
> [    6.324650]  kthread+0x221/0x231
> [    6.328251]  ? kthread_flush_work+0x120/0x120
> [    6.333114]  ret_from_fork+0x22/0x30
> [    6.337107] Code: a5 e8 70 58 f6 ff 5b 5d c3 55 48 89 e5 41 56 4c 8d b7 08 02
> 00 00 41 55 41 54 49 89 fc 53 e8 1a e9 f5 ff 4c 89 f7 e8 8d d3 07 00 <49> 8b 9c
> 24 08 02 00 00 31 d2 be 01 00 00 00 48 8d bb b0 00 00
> [    6.358230] RIP: perf_event_ctx_lock_nested.isra.71+0x22/0x89 RSP: ffff8803eacffe10
> [    6.366779] CR2: 0000000000000208
> [    6.370477] ---[ end trace ff68e1917f0a2044 ]---
> [    6.383531] Kernel panic - not syncing: Fatal exception
> [    6.389640] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> The problem is a heisenbug - slight changes in the code, such as added logging,
> can make it disappear.
> 
> I added some logging and a long msleep() in hardlockup_detector_perf_cleanup().
> Here is the result:
> 
> [    0.274361] NMI watchdog: ############ hardlockup_detector_perf_init
> [    0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> [    1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3)
> [    7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0)
> [    7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1)
> [    7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2)
> [    7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3)
> [    7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1)
> [    7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup
> [    7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2)
> [    7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0)
> [    7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2)
> [    7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0)
> [    7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3)
> [    7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3)
> [    7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1)
> [    8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done
> 
> Looks like there are a number of problems: hardlockup_detector_event_create()
> creates the event data structure even if it is already created, and
> hardlockup_detector_perf_cleanup() runs unprotected and in parallel to
> the enable/create functions.

Hi,

Thomas created a deferred cleanup mechanism to help with hotplugging, so yes
an event tries to be re-created if the deferred cleanup wasn't run yet.  There
is probably a bug there to remove the re-created event off of the deferred
'dead_events_mask'.  And maybe some locking there too, though I don't know
if adding locking around that queue re-introduces the deadlock to the
hotplug code.  See commit 941154bd6937a710ae9193a3c733c0029e5ae7b8 for
details.


> 
> ALso, the following message is seen twice.
> 
> [    0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> [    7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
> 
> I don't offer a proposed patch since I have no idea how to best solve the
> problem.
> 
> Also, is the repeated enable/disable/cleanup as part of the normal boot
> really necessary ?

Yes, part of it.  The first event_create(0) is a hardware check.  Does the
hardware support perf counters.  The event is removed after the check. If
the check passes, allow all cpus to create the event.  Hence the second
event_create(0).  Otherwise, disable the counters from being enabled on
other cpus.

What doesn't make sense is the perf_disable(0-3) at the 7 second mark.  Not
sure why that happens.  But it seems to cause the problem by exposing a race
of disabling the event and then re-creating it.

What is odd, is the panic.

	hardlockup_detector_perf_enable ->
		hardlockup_detector_event_create
		perf_event_enable ->
			panic()

due to the delay of hardlockup_detector_perf_cleanup(), I would have assumed
hardlockup_detector_event_create() would have warned and returned
had the cleanup not been invoked yet.

The only race I can see here:

CPU A						CPU B
						hardlockup_detector_perf_cleanup
hardlockup_detector_perf_enable				 perf_event_release_kernel
	hardlockup_detector_event_create
							per_cpu(watchdog_ev, cpu) = NULL;
	perf_event_enable(this_cpu_read(watchdog_ev)
					^^^NULL


I am guessing adding a check in hardlockup_detector_perf_enable() would
prove that, but as you said, it might may the problem go into hiding.


The below patch probably doesn't solve the problem, just makes the race
condition a lot smaller?  Though I don't know if this causes more problems
internally to perf with two attached events temporarily.

Cheers,
Don


diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 71a62ceacdc8..0b9bd1e0bf57 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -221,12 +221,16 @@ void hardlockup_detector_perf_cleanup(void)
 		struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
 		/*
+ 		 * Clear immediately to avoid delay of releasing event.
+ 		 */
+		per_cpu(watchdog_ev, cpu) = NULL;
+
+		/*
 		 * Required because for_each_cpu() reports  unconditionally
 		 * CPU0 as set on UP kernels. Sigh.
 		 */
 		if (event)
 			perf_event_release_kernel(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
 	}
 	cpumask_clear(&dead_events_mask);
 }

      parent reply	other threads:[~2017-10-31 18:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-30 22:45 Crashes in perf_event_ctx_lock_nested Guenter Roeck
2017-10-31 13:48 ` Peter Zijlstra
2017-10-31 17:16   ` Guenter Roeck
2017-10-31 18:50     ` Don Zickus
2017-10-31 20:12       ` Guenter Roeck
2017-10-31 20:23         ` Don Zickus
2017-10-31 21:32   ` Thomas Gleixner
2017-10-31 22:11     ` Guenter Roeck
2017-11-01 18:11       ` Don Zickus
2017-11-01 18:34         ` Guenter Roeck
2017-11-01 19:46         ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus
2017-11-01 20:28         ` tip-bot for Don Zickus
2017-11-01 18:22       ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner
2017-11-01  8:14     ` Peter Zijlstra
2017-11-01  8:26       ` Thomas Gleixner
2017-11-01 19:46     ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner
2017-11-01 20:32       ` Guenter Roeck
2017-11-01 20:52         ` Thomas Gleixner
2017-11-01 20:27     ` tip-bot for Thomas Gleixner
2017-10-31 18:48 ` Don Zickus [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171031184853.y4bliffza6hdhjkk@redhat.com \
    --to=dzickus@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.