From: Guenter Roeck <linux@roeck-us.net> To: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org, Don Zickus <dzickus@redhat.com>, Ingo Molnar <mingo@kernel.org> Subject: Crashes in perf_event_ctx_lock_nested Date: Mon, 30 Oct 2017 15:45:12 -0700 [thread overview] Message-ID: <20171030224512.GA13592@roeck-us.net> (raw) Hi Thomas, we are seeing the following crash in v4.14-rc5/rc7 if CONFIG_HARDLOCKUP_DETECTOR is enabled. [ 5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. [ 5.915836] ================================================================== [ 5.917325] Unsafe core_pattern used with fs.suid_dumpable=2. [ 5.917325] Pipe handler or fully qualified core dump path required. [ 5.917325] Set kernel.core_pattern before fs.suid_dumpable. [ 5.924046] udevd[147]: starting version 225 [ 5.948520] BUG: KASAN: null-ptr-deref in perf_event_ctx_lock_nested.isra.71+0x22/0x89 [ 5.957380] Read of size 8 at addr 0000000000000208 by task watchdog/2/21 [ 5.964973] [ 5.966646] CPU: 2 PID: 21 Comm: watchdog/2 Not tainted 4.14.0-rc7 #30 [ 5.973947] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017 [ 5.982128] Call Trace: [ 5.984874] dump_stack+0x4d/0x63 [ 5.988585] kasan_report+0x24b/0x295 [ 5.992691] ? watchdog_nmi_enable+0x12/0x12 [ 5.997477] __asan_load8+0x81/0x83 [ 6.001388] perf_event_ctx_lock_nested.isra.71+0x22/0x89 [ 6.007437] perf_event_enable+0xf/0x27 [ 6.011737] hardlockup_detector_perf_enable+0x3e/0x40 [ 6.017493] watchdog_nmi_enable+0xe/0x12 [ 6.021990] watchdog_enable+0x8c/0xc5 [ 6.026195] smpboot_thread_fn+0x27a/0x3c7 [ 6.030788] ? sort_range+0x22/0x22 [ 6.034701] kthread+0x221/0x231 [ 6.038321] ? kthread_flush_work+0x120/0x120 [ 6.043204] ret_from_fork+0x22/0x30 [ 6.047207] ================================================================== ... [ 6.134561] BUG: unable to handle kernel NULL pointer dereference at 0000000000000208 [ 6.143316] IP: perf_event_ctx_lock_nested.isra.71+0x22/0x89 [ 6.149645] PGD 0 P4D 0 [ 6.152478] Oops: 0000 [#1] PREEMPT SMP KASAN [ 6.157350] Modules linked in: [ 6.160766] CPU: 2 PID: 21 Comm: watchdog/2 Tainted: G B 4.14.0-rc7 #30 [ 6.169422] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.95.0 09/27/2017 [ 6.177583] task: ffff8803eacd1100 task.stack: ffff8803eacf8000 [ 6.184206] RIP: 0010:perf_event_ctx_lock_nested.isra.71+0x22/0x89 [ 6.191118] RSP: 0018:ffff8803eacffe10 EFLAGS: 00010246 [ 6.196962] RAX: 0000000000000296 RBX: 0000000000000000 RCX: ffffffffa52ee8a5 [ 6.204941] RDX: d8ecf37b519af800 RSI: 0000000000000003 RDI: ffffffffa6274610 [ 6.212911] RBP: ffff8803eacffe30 R08: dffffc0000000000 R09: 0000000000000000 [ 6.220888] R10: ffffed007d59ffa9 R11: ffffc9000044c1a1 R12: 0000000000000000 [ 6.228867] R13: ffff8803eacd1100 R14: 0000000000000208 R15: ffffffffa535ce54 [ 6.231476] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: commit=600 [ 6.237449] EXT4-fs (mmcblk0p8): mounted filesystem with ordered data mode. Opts: (null) [ 6.255332] FS: 0000000000000000(0000) GS:ffff8803ed500000(0000) knlGS:0000000000000000 [ 6.264384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.270812] CR2: 0000000000000208 CR3: 0000000430615001 CR4: 00000000003606e0 [ 6.278789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6.286761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6.294741] Call Trace: [ 6.297480] perf_event_enable+0xf/0x27 [ 6.301771] hardlockup_detector_perf_enable+0x3e/0x40 [ 6.307515] watchdog_nmi_enable+0xe/0x12 [ 6.311990] watchdog_enable+0x8c/0xc5 [ 6.316176] smpboot_thread_fn+0x27a/0x3c7 [ 6.320757] ? sort_range+0x22/0x22 [ 6.324650] kthread+0x221/0x231 [ 6.328251] ? kthread_flush_work+0x120/0x120 [ 6.333114] ret_from_fork+0x22/0x30 [ 6.337107] Code: a5 e8 70 58 f6 ff 5b 5d c3 55 48 89 e5 41 56 4c 8d b7 08 02 00 00 41 55 41 54 49 89 fc 53 e8 1a e9 f5 ff 4c 89 f7 e8 8d d3 07 00 <49> 8b 9c 24 08 02 00 00 31 d2 be 01 00 00 00 48 8d bb b0 00 00 [ 6.358230] RIP: perf_event_ctx_lock_nested.isra.71+0x22/0x89 RSP: ffff8803eacffe10 [ 6.366779] CR2: 0000000000000208 [ 6.370477] ---[ end trace ff68e1917f0a2044 ]--- [ 6.383531] Kernel panic - not syncing: Fatal exception [ 6.389640] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) The problem is a heisenbug - slight changes in the code, such as added logging, can make it disappear. I added some logging and a long msleep() in hardlockup_detector_perf_cleanup(). Here is the result: [ 0.274361] NMI watchdog: ############ hardlockup_detector_perf_init [ 0.274915] NMI watchdog: ############ hardlockup_detector_event_create(0) [ 0.277049] NMI watchdog: ############ hardlockup_detector_perf_cleanup [ 0.277593] NMI watchdog: ############ hardlockup_detector_perf_enable(0) [ 0.278027] NMI watchdog: ############ hardlockup_detector_event_create(0) [ 1.312044] NMI watchdog: ############ hardlockup_detector_perf_cleanup done [ 1.385122] NMI watchdog: ############ hardlockup_detector_perf_enable(1) [ 1.386028] NMI watchdog: ############ hardlockup_detector_event_create(1) [ 1.466102] NMI watchdog: ############ hardlockup_detector_perf_enable(2) [ 1.475536] NMI watchdog: ############ hardlockup_detector_event_create(2) [ 1.535099] NMI watchdog: ############ hardlockup_detector_perf_enable(3) [ 1.535101] NMI watchdog: ############ hardlockup_detector_event_create(3) [ 7.222816] NMI watchdog: ############ hardlockup_detector_perf_disable(0) [ 7.230567] NMI watchdog: ############ hardlockup_detector_perf_disable(1) [ 7.243138] NMI watchdog: ############ hardlockup_detector_perf_disable(2) [ 7.250966] NMI watchdog: ############ hardlockup_detector_perf_disable(3) [ 7.258826] NMI watchdog: ############ hardlockup_detector_perf_enable(1) [ 7.258827] NMI watchdog: ############ hardlockup_detector_perf_cleanup [ 7.258831] NMI watchdog: ############ hardlockup_detector_perf_enable(2) [ 7.258833] NMI watchdog: ############ hardlockup_detector_perf_enable(0) [ 7.258834] NMI watchdog: ############ hardlockup_detector_event_create(2) [ 7.258835] NMI watchdog: ############ hardlockup_detector_event_create(0) [ 7.260169] NMI watchdog: ############ hardlockup_detector_perf_enable(3) [ 7.260170] NMI watchdog: ############ hardlockup_detector_event_create(3) [ 7.494251] NMI watchdog: ############ hardlockup_detector_event_create(1) [ 8.287135] NMI watchdog: ############ hardlockup_detector_perf_cleanup done Looks like there are a number of problems: hardlockup_detector_event_create() creates the event data structure even if it is already created, and hardlockup_detector_perf_cleanup() runs unprotected and in parallel to the enable/create functions. ALso, the following message is seen twice. [ 0.278758] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. [ 7.258838] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. I don't offer a proposed patch since I have no idea how to best solve the problem. Also, is the repeated enable/disable/cleanup as part of the normal boot really necessary ? Thanks, Guenter
next reply other threads:[~2017-10-30 22:45 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-10-30 22:45 Guenter Roeck [this message] 2017-10-31 13:48 ` Peter Zijlstra 2017-10-31 17:16 ` Guenter Roeck 2017-10-31 18:50 ` Don Zickus 2017-10-31 20:12 ` Guenter Roeck 2017-10-31 20:23 ` Don Zickus 2017-10-31 21:32 ` Thomas Gleixner 2017-10-31 22:11 ` Guenter Roeck 2017-11-01 18:11 ` Don Zickus 2017-11-01 18:34 ` Guenter Roeck 2017-11-01 19:46 ` [tip:core/urgent] watchdog/hardlockup/perf: Use atomics to track in-use cpu counter tip-bot for Don Zickus 2017-11-01 20:28 ` tip-bot for Don Zickus 2017-11-01 18:22 ` Crashes in perf_event_ctx_lock_nested Thomas Gleixner 2017-11-01 8:14 ` Peter Zijlstra 2017-11-01 8:26 ` Thomas Gleixner 2017-11-01 19:46 ` [tip:core/urgent] watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") tip-bot for Thomas Gleixner 2017-11-01 20:32 ` Guenter Roeck 2017-11-01 20:52 ` Thomas Gleixner 2017-11-01 20:27 ` tip-bot for Thomas Gleixner 2017-10-31 18:48 ` Crashes in perf_event_ctx_lock_nested Don Zickus
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171030224512.GA13592@roeck-us.net \ --to=linux@roeck-us.net \ --cc=dzickus@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@kernel.org \ --cc=tglx@linutronix.de \ --subject='Re: Crashes in perf_event_ctx_lock_nested' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.