From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755655AbbFOMta (ORCPT ); Mon, 15 Jun 2015 08:49:30 -0400 Received: from mga02.intel.com ([134.134.136.20]:23580 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754130AbbFOMtW (ORCPT ); Mon, 15 Jun 2015 08:49:22 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,618,1427785200"; d="scan'208";a="508513882" From: Alexander Shishkin To: Peter Zijlstra , Vince Weaver Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Stephane Eranian Subject: Re: perf: aux area related crash and warnings In-Reply-To: <20150615122054.GY3644@twins.programming.kicks-ass.net> References: <20150615122054.GY3644@twins.programming.kicks-ass.net> User-Agent: Notmuch/0.20.1 (http://notmuchmail.org) Emacs/24.4.1 (x86_64-pc-linux-gnu) Date: Mon, 15 Jun 2015 15:49:19 +0300 Message-ID: <87egld2l2o.fsf@ashishki-desk.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > Alex, any clue? Let me look into it. Definitely haven't seen anything like that in my tests. > On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote: >> On Thu, 11 Jun 2015, Vince Weaver wrote: >> >> > and while I was trying to cut and paste that, the locked haswell just took >> > down the network switch so I can't get the rest until tomorrow. >> >> here are the full dumps if anyone is interested >> >> the warning is reproducible, the spinlock disaster is not. >> >> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410 >> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4 >> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155 >> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 >> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054 >> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0 >> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0 >> [36299.046338] Call Trace: >> [36299.049501] [] dump_stack+0x45/0x57 >> [36299.056284] [] spin_dump+0x85/0xe0 >> [36299.062282] [] spin_bug+0x26/0x30 >> [36299.068111] [] do_raw_spin_lock+0x13f/0x180 >> [36299.074897] [] _raw_spin_lock+0x39/0x40 >> [36299.081276] [] ? free_pcppages_bulk+0x39/0x620 >> [36299.088340] [] free_pcppages_bulk+0x39/0x620 >> [36299.095182] [] ? free_pages_prepare+0x3a4/0x550 >> [36299.102291] [] ? kfree_debugcheck+0x16/0x40 >> [36299.108987] [] free_hot_cold_page+0x178/0x1a0 >> [36299.115850] [] __free_pages+0x37/0x50 >> [36299.121991] [] rb_free_aux+0xba/0xf0 This one goes to free aux pages from nmi context, looks like aux buffer was unmapped while the event was running, so here it dropped the last reference. >> [36299.128034] [] perf_aux_output_end+0xb7/0xf0 >> [36299.134793] [] intel_bts_interrupt+0x8e/0xd0 >> [36299.141543] [] intel_pmu_handle_irq+0x4f/0x450 >> [36299.148482] [] ? check_chain_key+0x128/0x1e0 >> [36299.155249] [] perf_event_nmi_handler+0x2b/0x50 >> [36299.162273] [] nmi_handle+0xa0/0x150 [snip] >> >> and >> >> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0() >> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys >> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155 >> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 >> [27716.891168] ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000 >> [27716.899782] 0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80 >> [27716.908372] ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80 >> [27716.916981] Call Trace: >> [27716.920180] [] dump_stack+0x45/0x57 >> [27716.926250] [] warn_slowpath_common+0x8a/0xc0 >> [27716.933244] [] warn_slowpath_null+0x1a/0x20 >> [27716.939999] [] perf_aux_output_begin+0x1ce/0x1f0 >> [27716.947225] [] ? perf_event_update_userpage+0x5/0x170 >> [27716.954875] [] bts_event_add+0x6d/0x150 >> [27716.961259] [] event_sched_in.isra.85+0xfb/0x330 >> [27716.968495] [] group_sched_in+0xba/0x1e0 >> [27716.974957] [] ? native_sched_clock+0x2a/0x90 >> [27716.981888] [] ctx_sched_in+0x1d0/0x3f0 >> [27716.988274] [] perf_event_sched_in+0x69/0xa0 >> [27716.995082] [] perf_event_context_sched_in+0x90/0x160 >> [27717.002714] [] __perf_event_task_sched_in+0xa8/0xc0 >> [27717.010160] [] finish_task_switch+0xed/0x140 >> [27717.016988] [] ? __switch_to+0x263/0x5d0 >> [27717.023400] [] schedule_tail+0x11/0xa0 >> [27717.029652] [] ret_from_fork+0xf/0x70 >> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]--- >> >> which corresponds to >> /* >> * Nesting is not supported for AUX area, make sure nested >> * writers are caught early >> */ >> if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1))) >> goto err_put; >> This really shouldn't happen, I'd need to check with the fuzzer. Thanks, -- Alex