All of lore.kernel.org
 help / color / mirror / Atom feed
* perf: aux area related crash and warnings
@ 2015-06-12  3:15 Vince Weaver
  2015-06-12 18:42 ` Vince Weaver
  0 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2015-06-12  3:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian


The fuzzer turned these up (this is 4.1-rc7 with the fasync patch 
applied) on a Haswell.  I'm listing the crash first, but the warning
happened earlier, not sure if it is related.

[36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
[36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
[36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
[36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
[36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
[36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
[36299.046338] Call Trace:
[36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
[36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
[36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
[36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
[36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
[36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
[36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
[36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
[36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
[36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
[36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
[36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
[36299.128034]  [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
[36299.134793]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[36299.141543]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[36299.148482]  [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
[36299.155249]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[36299.162273]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[36299.168278]  [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[36299.174377]  [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[36299.180735]  [<ffffffff81018a08>] do_nmi+0x98/0xe0
[36299.186219]  [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
[36299.192501]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.199951]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.207410]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.214898]  <<EOE>>  [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000

and while I was trying to cut and paste that, the locked haswell just took 
down the network switch so I can't get the rest until tomorrow.

The warning was
[27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()

which corresponds to 
        /*
         * Nesting is not supported for AUX area, make sure nested
         * writers are caught early
         */
        if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
                goto err_put;

again just lost access to the machine with the serial console, for the 
full backtrace it will have to wait until I'm not remote.

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-12  3:15 perf: aux area related crash and warnings Vince Weaver
@ 2015-06-12 18:42 ` Vince Weaver
  2015-06-15 12:20   ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2015-06-12 18:42 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 11 Jun 2015, Vince Weaver wrote:

> and while I was trying to cut and paste that, the locked haswell just took 
> down the network switch so I can't get the rest until tomorrow.

here are the full dumps if anyone is interested

the warning is reproducible, the spinlock disaster is not.

[36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
[36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
[36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
[36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
[36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
[36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
[36299.046338] Call Trace:
[36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
[36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
[36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
[36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
[36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
[36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
[36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
[36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
[36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
[36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
[36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
[36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
[36299.128034]  [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
[36299.134793]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[36299.141543]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[36299.148482]  [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
[36299.155249]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[36299.162273]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[36299.168278]  [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[36299.174377]  [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[36299.180735]  [<ffffffff81018a08>] do_nmi+0x98/0xe0
[36299.186219]  [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
[36299.192501]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.199951]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.207410]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.214898]  <<EOE>>  [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
[36299.223222]  [<ffffffff810bf365>] lock_acquire+0xa5/0x130
[36299.229409]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.235551]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.241720]  [<ffffffff8117810b>] bad_range+0x6b/0x150
[36299.247612]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.253757]  [<ffffffff8116a8c5>] ? perf_output_begin+0x5/0x270
[36299.260489]  [<ffffffff8117b379>] __rmqueue+0x109/0x510
[36299.266501]  [<ffffffff8117bf63>] get_page_from_freelist+0x7e3/0xa50
[36299.273680]  [<ffffffff8117c4cb>] __alloc_pages_nodemask+0x1fb/0xa90
[36299.280872]  [<ffffffff811c416b>] ? alloc_pages_vma+0x10b/0x290
[36299.287586]  [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
[36299.295017]  [<ffffffff811c416b>] alloc_pages_vma+0x10b/0x290
[36299.301544]  [<ffffffff8119f1a4>] ? wp_page_copy.isra.51+0x84/0x4f0
[36299.308613]  [<ffffffff8119f1a4>] wp_page_copy.isra.51+0x84/0x4f0
[36299.315506]  [<ffffffff811a0d94>] ? do_wp_page+0x84/0x6a0
[36299.321633]  [<ffffffff811a0dad>] do_wp_page+0x9d/0x6a0
[36299.327533]  [<ffffffff811a2b69>] ? handle_mm_fault+0x309/0x1770
[36299.334245]  [<ffffffff811a364c>] handle_mm_fault+0xdec/0x1770
[36299.340854]  [<ffffffff81063034>] __do_page_fault+0x1b4/0x460
[36299.347337]  [<ffffffff8106334e>] trace_do_page_fault+0x3e/0xc0
[36299.354021]  [<ffffffff816e1068>] trace_page_fault+0x28/0x30
[36315.086708] ------------[ cut here ]------------


and

[27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
[27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
[27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
[27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[27716.891168]  ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
[27716.899782]  0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
[27716.908372]  ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
[27716.916981] Call Trace:
[27716.920180]  [<ffffffff816d7229>] dump_stack+0x45/0x57
[27716.926250]  [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
[27716.933244]  [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
[27716.939999]  [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
[27716.947225]  [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
[27716.954875]  [<ffffffff8103767d>] bts_event_add+0x6d/0x150
[27716.961259]  [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
[27716.968495]  [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
[27716.974957]  [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
[27716.981888]  [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
[27716.988274]  [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
[27716.995082]  [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
[27717.002714]  [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
[27717.010160]  [<ffffffff8109897d>] finish_task_switch+0xed/0x140
[27717.016988]  [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
[27717.023400]  [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
[27717.029652]  [<ffffffff816df57f>] ret_from_fork+0xf/0x70
[27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---

which corresponds to 
        /*
         * Nesting is not supported for AUX area, make sure nested
         * writers are caught early
         */
        if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
                goto err_put;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-12 18:42 ` Vince Weaver
@ 2015-06-15 12:20   ` Peter Zijlstra
  2015-06-15 12:49     ` Alexander Shishkin
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-06-15 12:20 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Stephane Eranian, Alexander Shishkin


Alex, any clue?

On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
> On Thu, 11 Jun 2015, Vince Weaver wrote:
> 
> > and while I was trying to cut and paste that, the locked haswell just took 
> > down the network switch so I can't get the rest until tomorrow.
> 
> here are the full dumps if anyone is interested
> 
> the warning is reproducible, the spinlock disaster is not.
> 
> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
> [36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
> [36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
> [36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
> [36299.046338] Call Trace:
> [36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
> [36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
> [36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
> [36299.128034]  [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
> [36299.134793]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
> [36299.141543]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
> [36299.148482]  [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
> [36299.155249]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
> [36299.162273]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150
> [36299.168278]  [<ffffffff81018535>] ? nmi_handle+0x5/0x150
> [36299.174377]  [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
> [36299.180735]  [<ffffffff81018a08>] do_nmi+0x98/0xe0
> [36299.186219]  [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
> [36299.192501]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.199951]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.207410]  [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.214898]  <<EOE>>  [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
> [36299.223222]  [<ffffffff810bf365>] lock_acquire+0xa5/0x130
> [36299.229409]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.235551]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.241720]  [<ffffffff8117810b>] bad_range+0x6b/0x150
> [36299.247612]  [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.253757]  [<ffffffff8116a8c5>] ? perf_output_begin+0x5/0x270
> [36299.260489]  [<ffffffff8117b379>] __rmqueue+0x109/0x510
> [36299.266501]  [<ffffffff8117bf63>] get_page_from_freelist+0x7e3/0xa50
> [36299.273680]  [<ffffffff8117c4cb>] __alloc_pages_nodemask+0x1fb/0xa90
> [36299.280872]  [<ffffffff811c416b>] ? alloc_pages_vma+0x10b/0x290
> [36299.287586]  [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
> [36299.295017]  [<ffffffff811c416b>] alloc_pages_vma+0x10b/0x290
> [36299.301544]  [<ffffffff8119f1a4>] ? wp_page_copy.isra.51+0x84/0x4f0
> [36299.308613]  [<ffffffff8119f1a4>] wp_page_copy.isra.51+0x84/0x4f0
> [36299.315506]  [<ffffffff811a0d94>] ? do_wp_page+0x84/0x6a0
> [36299.321633]  [<ffffffff811a0dad>] do_wp_page+0x9d/0x6a0
> [36299.327533]  [<ffffffff811a2b69>] ? handle_mm_fault+0x309/0x1770
> [36299.334245]  [<ffffffff811a364c>] handle_mm_fault+0xdec/0x1770
> [36299.340854]  [<ffffffff81063034>] __do_page_fault+0x1b4/0x460
> [36299.347337]  [<ffffffff8106334e>] trace_do_page_fault+0x3e/0xc0
> [36299.354021]  [<ffffffff816e1068>] trace_page_fault+0x28/0x30
> [36315.086708] ------------[ cut here ]------------
> 
> 
> and
> 
> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [27716.891168]  ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
> [27716.899782]  0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
> [27716.908372]  ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
> [27716.916981] Call Trace:
> [27716.920180]  [<ffffffff816d7229>] dump_stack+0x45/0x57
> [27716.926250]  [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
> [27716.933244]  [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
> [27716.939999]  [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
> [27716.947225]  [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
> [27716.954875]  [<ffffffff8103767d>] bts_event_add+0x6d/0x150
> [27716.961259]  [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
> [27716.968495]  [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
> [27716.974957]  [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
> [27716.981888]  [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
> [27716.988274]  [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
> [27716.995082]  [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
> [27717.002714]  [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
> [27717.010160]  [<ffffffff8109897d>] finish_task_switch+0xed/0x140
> [27717.016988]  [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
> [27717.023400]  [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
> [27717.029652]  [<ffffffff816df57f>] ret_from_fork+0xf/0x70
> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---
> 
> which corresponds to 
>         /*
>          * Nesting is not supported for AUX area, make sure nested
>          * writers are caught early
>          */
>         if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
>                 goto err_put;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-15 12:20   ` Peter Zijlstra
@ 2015-06-15 12:49     ` Alexander Shishkin
  2015-06-15 20:24       ` Ingo Molnar
  2015-06-16 11:37       ` Alexander Shishkin
  0 siblings, 2 replies; 14+ messages in thread
From: Alexander Shishkin @ 2015-06-15 12:49 UTC (permalink / raw)
  To: Peter Zijlstra, Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

Peter Zijlstra <peterz@infradead.org> writes:

> Alex, any clue?

Let me look into it. Definitely haven't seen anything like that in my
tests.

> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>> 
>> > and while I was trying to cut and paste that, the locked haswell just took 
>> > down the network switch so I can't get the rest until tomorrow.
>> 
>> here are the full dumps if anyone is interested
>> 
>> the warning is reproducible, the spinlock disaster is not.
>> 
>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>> [36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>> [36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>> [36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>> [36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>> [36299.046338] Call Trace:
>> [36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
>> [36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
>> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
>> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0

This one goes to free aux pages from nmi context, looks like aux buffer
was unmapped while the event was running, so here it dropped the last
reference.

>> [36299.128034]  [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
>> [36299.134793]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
>> [36299.141543]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
>> [36299.148482]  [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
>> [36299.155249]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
>> [36299.162273]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150

[snip]

>> 
>> and
>> 
>> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
>> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
>> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
>> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>> [27716.891168]  ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
>> [27716.899782]  0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
>> [27716.908372]  ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
>> [27716.916981] Call Trace:
>> [27716.920180]  [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [27716.926250]  [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
>> [27716.933244]  [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
>> [27716.939999]  [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
>> [27716.947225]  [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
>> [27716.954875]  [<ffffffff8103767d>] bts_event_add+0x6d/0x150
>> [27716.961259]  [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
>> [27716.968495]  [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
>> [27716.974957]  [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
>> [27716.981888]  [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
>> [27716.988274]  [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
>> [27716.995082]  [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
>> [27717.002714]  [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
>> [27717.010160]  [<ffffffff8109897d>] finish_task_switch+0xed/0x140
>> [27717.016988]  [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
>> [27717.023400]  [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
>> [27717.029652]  [<ffffffff816df57f>] ret_from_fork+0xf/0x70
>> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---
>> 
>> which corresponds to 
>>         /*
>>          * Nesting is not supported for AUX area, make sure nested
>>          * writers are caught early
>>          */
>>         if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
>>                 goto err_put;
>> 

This really shouldn't happen, I'd need to check with the fuzzer.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-15 12:49     ` Alexander Shishkin
@ 2015-06-15 20:24       ` Ingo Molnar
  2015-06-16 11:37       ` Alexander Shishkin
  1 sibling, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2015-06-15 20:24 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > Alex, any clue?
> 
> Let me look into it. Definitely haven't seen anything like that in my
> tests.

That's natural: Vince is running randomize fuzzing tests, so you should look out 
for boundary conditions and 'nonsensical' values that won't normally trigger in 
functional testing.

In fact Vince is using 'directed fuzzing': i.e. the fuzzer is aware of the general 
perf ABI structure and will try to generate partially valid, partially randomized 
requests, to be able to test 'leaf' functionality of the perf ABI as well, which 
would otherwise need astronomical odds to occur in a pure fuzzing test.

These crashes started popping up when Vince added 'AUX area awareness' to the 
fuzzer.

> >> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
> >> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
> >> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
> >> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
> >> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
> >> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
> >> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
> >> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
> >> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
> 
> This one goes to free aux pages from nmi context, looks like aux buffer was 
> unmapped while the event was running, so here it dropped the last reference.

Yeah, that in itself is an absolute no-no - so I guess refcounting went wrong 
somewhere? (assuming it exists properly).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-15 12:49     ` Alexander Shishkin
  2015-06-15 20:24       ` Ingo Molnar
@ 2015-06-16 11:37       ` Alexander Shishkin
  2015-06-18  9:09         ` Ingo Molnar
  2015-06-18 10:32         ` Peter Zijlstra
  1 sibling, 2 replies; 14+ messages in thread
From: Alexander Shishkin @ 2015-06-16 11:37 UTC (permalink / raw)
  To: Peter Zijlstra, Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> Alex, any clue?
>
> Let me look into it. Definitely haven't seen anything like that in my
> tests.
>
>> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>>> 
>>> > and while I was trying to cut and paste that, the locked haswell just took 
>>> > down the network switch so I can't get the rest until tomorrow.
>>> 
>>> here are the full dumps if anyone is interested
>>> 
>>> the warning is reproducible, the spinlock disaster is not.
>>> 
>>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>>> [36298.992915]  lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc7+ #155
>>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>>> [36299.020606]  ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>>> [36299.029199]  ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>>> [36299.037796]  ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>>> [36299.046338] Call Trace:
>>> [36299.049501]  <NMI>  [<ffffffff816d7229>] dump_stack+0x45/0x57
>>> [36299.056284]  [<ffffffff810c2895>] spin_dump+0x85/0xe0
>>> [36299.062282]  [<ffffffff810c2916>] spin_bug+0x26/0x30
>>> [36299.068111]  [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>>> [36299.074897]  [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>>> [36299.081276]  [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>>> [36299.088340]  [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>>> [36299.095182]  [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>>> [36299.102291]  [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>>> [36299.108987]  [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>>> [36299.115850]  [<ffffffff8117aa47>] __free_pages+0x37/0x50
>>> [36299.121991]  [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
>
> This one goes to free aux pages from nmi context, looks like aux buffer
> was unmapped while the event was running, so here it dropped the last
> reference.

Ok, here's what I propose for this one.

>From b8cd18bc440c318e8b30825bf654c815b42aa1e0 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Tue, 16 Jun 2015 14:14:04 +0300
Subject: [PATCH] perf: Free AUX area pages in rcu callback

Currently, if the user unmaps AUX area while the corresponding event
is active, perf_aux_output_end() may be the last one to drop the aux
area refcount, and end up freeing the pages in NMI context or scheduler's
fast path. Same can happen in the error path of perf_aux_output_begin().

To avoid the bug, this patch moves actual freeing code to rb_free_rcu(),
which will know whether it is called for AUX area or the ring buffer
proper and act accordingly.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
---
 kernel/events/core.c        | 31 ++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |  1 +
 kernel/events/ring_buffer.c |  8 +-------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index eddf1ed415..5f1cc5976f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4381,7 +4381,36 @@ static void rb_free_rcu(struct rcu_head *rcu_head)
 	struct ring_buffer *rb;
 
 	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
-	rb_free(rb);
+
+	/*
+	 * are we called for AUX or the rb:
+	 * AUX always goes first, then if rb::refcount drops to zero,
+	 * free rb synchronously
+	 */
+	if (atomic_read(&rb->refcount)) {
+		__rb_free_aux(rb);
+
+		/* matches the increment in rb_free_aux() */
+		if (atomic_dec_and_test(&rb->refcount))
+			rb_free(rb);
+	} else {
+		rb_free(rb);
+	}
+}
+
+void rb_free_aux(struct ring_buffer *rb)
+{
+	/*
+	 * hold rb::refcount to make sure rb doesn't disappear
+	 * before aux pages are freed
+	 */
+	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
+		return;
+
+	if (atomic_dec_and_test(&rb->aux_refcount))
+		call_rcu(&rb->rcu_head, rb_free_rcu);
+	else
+		ring_buffer_put(rb);	/* matches the increment above */
 }
 
 struct ring_buffer *ring_buffer_get(struct perf_event *event)
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 9f6ce9ba4a..7f8242ed85 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -61,6 +61,7 @@ extern void perf_event_wakeup(struct perf_event *event);
 extern int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
 			pgoff_t pgoff, int nr_pages, long watermark, int flags);
 extern void rb_free_aux(struct ring_buffer *rb);
+extern void __rb_free_aux(struct ring_buffer *rb);
 extern struct ring_buffer *ring_buffer_get(struct perf_event *event);
 extern void ring_buffer_put(struct ring_buffer *rb);
 
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 725c416085..343121e943 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -537,7 +537,7 @@ out:
 	return ret;
 }
 
-static void __rb_free_aux(struct ring_buffer *rb)
+void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
@@ -554,12 +554,6 @@ static void __rb_free_aux(struct ring_buffer *rb)
 	rb->aux_nr_pages = 0;
 }
 
-void rb_free_aux(struct ring_buffer *rb)
-{
-	if (atomic_dec_and_test(&rb->aux_refcount))
-		__rb_free_aux(rb);
-}
-
 #ifndef CONFIG_PERF_USE_VMALLOC
 
 /*
-- 
2.1.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-16 11:37       ` Alexander Shishkin
@ 2015-06-18  9:09         ` Ingo Molnar
  2015-06-18  9:35           ` Alexander Shishkin
  2015-06-18 16:30           ` Paul E. McKenney
  2015-06-18 10:32         ` Peter Zijlstra
  1 sibling, 2 replies; 14+ messages in thread
From: Ingo Molnar @ 2015-06-18  9:09 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Paul E. McKenney


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> +void rb_free_aux(struct ring_buffer *rb)
> +{
> +	/*
> +	 * hold rb::refcount to make sure rb doesn't disappear
> +	 * before aux pages are freed
> +	 */
> +	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> +		return;
> +
> +	if (atomic_dec_and_test(&rb->aux_refcount))
> +		call_rcu(&rb->rcu_head, rb_free_rcu);
> +	else
> +		ring_buffer_put(rb);	/* matches the increment above */

Is call_rcu() NMI-safe? I don't think so ...

I think the life time rules of this object are really messed up if they can be 
freed from any fast path. How come the freeing can happen in NMI context? 
Shouldn't the hardware first stop, then we can free things from the system call 
path, or so?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-18  9:09         ` Ingo Molnar
@ 2015-06-18  9:35           ` Alexander Shishkin
  2015-06-18 10:24             ` Peter Zijlstra
  2015-06-18 16:30           ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Shishkin @ 2015-06-18  9:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Paul E. McKenney

Ingo Molnar <mingo@kernel.org> writes:

> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>
>> +void rb_free_aux(struct ring_buffer *rb)
>> +{
>> +	/*
>> +	 * hold rb::refcount to make sure rb doesn't disappear
>> +	 * before aux pages are freed
>> +	 */
>> +	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
>> +		return;
>> +
>> +	if (atomic_dec_and_test(&rb->aux_refcount))
>> +		call_rcu(&rb->rcu_head, rb_free_rcu);
>> +	else
>> +		ring_buffer_put(rb);	/* matches the increment above */
>
> Is call_rcu() NMI-safe? I don't think so ...

Actually, we couldn't establish that without Paul's help.

> I think the life time rules of this object are really messed up if
> they can be freed from any fast path. How come the freeing can happen
> in NMI context?

We ended up like this because we didn't want to stop the event once its
AUX area gets unmapped, to be consistent with the behavior of normal
perf events [1]. And somehow I completely missed the fact that this
means freeing stuff in fast path.

> Shouldn't the hardware first stop, then we can free things from the system call 
> path, or so?

Yes, that's how we'll do it.

[1] http://marc.info/?l=linux-kernel&m=141017607804348

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-18  9:35           ` Alexander Shishkin
@ 2015-06-18 10:24             ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-06-18 10:24 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Paul E. McKenney

On Thu, Jun 18, 2015 at 12:35:31PM +0300, Alexander Shishkin wrote:
> Ingo Molnar <mingo@kernel.org> writes:
> 
> > * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
> >
> >> +void rb_free_aux(struct ring_buffer *rb)
> >> +{
> >> +	/*
> >> +	 * hold rb::refcount to make sure rb doesn't disappear
> >> +	 * before aux pages are freed
> >> +	 */
> >> +	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> >> +		return;
> >> +
> >> +	if (atomic_dec_and_test(&rb->aux_refcount))
> >> +		call_rcu(&rb->rcu_head, rb_free_rcu);
> >> +	else
> >> +		ring_buffer_put(rb);	/* matches the increment above */
> >
> > Is call_rcu() NMI-safe? I don't think so ...
> 
> Actually, we couldn't establish that without Paul's help.

I was fairly sure it was not, Paul did confirm last night it is indeed
not.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-16 11:37       ` Alexander Shishkin
  2015-06-18  9:09         ` Ingo Molnar
@ 2015-06-18 10:32         ` Peter Zijlstra
  2015-06-18 13:11           ` Alexander Shishkin
                             ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-06-18 10:32 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
> Ok, here's what I propose for this one.

As discussed yesterday, I think we want to change to how we do reference
counting for the regular buffers, but that will require quite a bit of
work.

For the interim we can avoid the issue by something like the below;
compile tested only.

---
 kernel/events/core.c        |  8 --------
 kernel/events/internal.h    | 10 ++++++++++
 kernel/events/ring_buffer.c | 27 +++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1e33b9141f03..76b7694b7909 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4352,14 +4352,6 @@ static void ring_buffer_wakeup(struct perf_event *event)
 	rcu_read_unlock();
 }
 
-static void rb_free_rcu(struct rcu_head *rcu_head)
-{
-	struct ring_buffer *rb;
-
-	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
-	rb_free(rb);
-}
-
 struct ring_buffer *ring_buffer_get(struct perf_event *event)
 {
 	struct ring_buffer *rb;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2deb24c7a40d..2bbad9c1274c 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,6 +11,7 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
+	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
@@ -55,6 +56,15 @@ struct ring_buffer {
 };
 
 extern void rb_free(struct ring_buffer *rb);
+
+static inline void rb_free_rcu(struct rcu_head *rcu_head)
+{
+	struct ring_buffer *rb;
+
+	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
+	rb_free(rb);
+}
+
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
 extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 96472824a752..b2be01b1aa9d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,6 +221,8 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
+static void rb_irq_work(struct irq_work *work);
+
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -241,6 +243,16 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
+	init_irq_work(&rb->irq_work, rb_irq_work);
+}
+
+static void ring_buffer_put_async(struct ring_buffer *rb)
+{
+	if (!atomic_dec_and_test(&rb->refcount))
+		return;
+
+	rb->rcu_head.next = (void *)rb;
+	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -319,7 +331,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put(rb);
+	ring_buffer_put_async(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -370,7 +382,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put(rb);
+	ring_buffer_put_async(rb);
 }
 
 /*
@@ -557,7 +569,18 @@ static void __rb_free_aux(struct ring_buffer *rb)
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
+		irq_work_queue(&rb->irq_work);
+}
+
+static void rb_irq_work(struct irq_work *work)
+{
+	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
+
+	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
+
+	if (rb->rcu_head.next == (void *)rb)
+		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-18 10:32         ` Peter Zijlstra
@ 2015-06-18 13:11           ` Alexander Shishkin
  2015-06-19 14:21           ` Vince Weaver
  2015-07-06 15:34           ` [tip:perf/urgent] perf: Fix AUX buffer refcounting tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 14+ messages in thread
From: Alexander Shishkin @ 2015-06-18 13:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
>> Ok, here's what I propose for this one.
>
> As discussed yesterday, I think we want to change to how we do reference
> counting for the regular buffers, but that will require quite a bit of
> work.
>
> For the interim we can avoid the issue by something like the below;
> compile tested only.

Looks good and works fine on my tests. Only I'd maybe add a small
comment on why reusing rcu_head.next doesn't set off call_rcu()
debugging. Other than that, fwiw,

Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-18  9:09         ` Ingo Molnar
  2015-06-18  9:35           ` Alexander Shishkin
@ 2015-06-18 16:30           ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2015-06-18 16:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Shishkin, Peter Zijlstra, Vince Weaver, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Jun 18, 2015 at 11:09:55AM +0200, Ingo Molnar wrote:
> 
> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
> 
> > +void rb_free_aux(struct ring_buffer *rb)
> > +{
> > +	/*
> > +	 * hold rb::refcount to make sure rb doesn't disappear
> > +	 * before aux pages are freed
> > +	 */
> > +	if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> > +		return;
> > +
> > +	if (atomic_dec_and_test(&rb->aux_refcount))
> > +		call_rcu(&rb->rcu_head, rb_free_rcu);
> > +	else
> > +		ring_buffer_put(rb);	/* matches the increment above */
> 
> Is call_rcu() NMI-safe? I don't think so ...

Definitely not!  ;-)

> I think the life time rules of this object are really messed up if they can be 
> freed from any fast path. How come the freeing can happen in NMI context? 
> Shouldn't the hardware first stop, then we can free things from the system call 
> path, or so?
> 
> Thanks,
> 
> 	Ingo
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf: aux area related crash and warnings
  2015-06-18 10:32         ` Peter Zijlstra
  2015-06-18 13:11           ` Alexander Shishkin
@ 2015-06-19 14:21           ` Vince Weaver
  2015-07-06 15:34           ` [tip:perf/urgent] perf: Fix AUX buffer refcounting tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 14+ messages in thread
From: Vince Weaver @ 2015-06-19 14:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Vince Weaver, linux-kernel, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, 18 Jun 2015, Peter Zijlstra wrote:

> On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
> > Ok, here's what I propose for this one.
> 
> As discussed yesterday, I think we want to change to how we do reference
> counting for the regular buffers, but that will require quite a bit of
> work.
> 
> For the interim we can avoid the issue by something like the below;
> compile tested only.

I did manage to crash the system with Alexander Shishkin's patch applied 
(See below).

I will try with this updated patch instead, but the bug usually takes a 
few days  to trigger with the fuzzer and I won't be able to let it run 
that long this time.

Vince

[183592.949539] BUG: spinlock recursion on CPU#2, perf_fuzzer/434
[183592.956559]  lock: pool_lock+0x0/0x40, .magic: dead4ead, .owner: perf_fuzzer/434, .owner_cpu: 2
[183592.966783] CPU: 2 PID: 434 Comm: perf_fuzzer Tainted: G        W       4.1.0-rc8+ #157
[183592.976228] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[183592.984997]  ffffffff81cc2120 ffff88011ea85970 ffffffff816d7419 0000000000000053
[183592.993871]  ffff8800cf26a8d0 ffff88011ea85990 ffffffff810c28a5 ffffffff81cc2120
[183593.002787]  ffffffff81a1e4d1 ffff88011ea859b0 ffffffff810c2926 ffffffff81cc2120
[183593.011632] Call Trace:
[183593.015016]  <NMI>  [<ffffffff816d7419>] dump_stack+0x45/0x57
[183593.021966]  [<ffffffff810c28a5>] spin_dump+0x85/0xe0
[183593.028153]  [<ffffffff810c2926>] spin_bug+0x26/0x30
[183593.034234]  [<ffffffff810c2adf>] do_raw_spin_lock+0x13f/0x180
[183593.041233]  [<ffffffff816de8d9>] _raw_spin_lock+0x39/0x40
[183593.047849]  [<ffffffff81428d2d>] ? __debug_object_init+0xad/0x450
[183593.055189]  [<ffffffff81428d2d>] __debug_object_init+0xad/0x450
[183593.062375]  [<ffffffff814290eb>] debug_object_init+0x1b/0x20
[183593.069267]  [<ffffffff810d8fb8>] rcuhead_fixup_activate+0x28/0x40
[183593.076638]  [<ffffffff81429201>] debug_object_activate+0xf1/0x1e0
[183593.083969]  [<ffffffff8115e690>] ? free_event_rcu+0x40/0x40
[183593.090781]  [<ffffffff810dc8cb>] __call_rcu.constprop.58+0x3b/0x340
[183593.098321]  [<ffffffff8115711c>] ? irq_work_queue+0x6c/0x80
[183593.105123]  [<ffffffff810dcc27>] call_rcu_sched+0x17/0x20
[183593.111709]  [<ffffffff81166680>] rb_free_aux+0x40/0x90
[183593.118046]  [<ffffffff8116b067>] perf_aux_output_end+0xb7/0xf0
[183593.125089]  [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[183593.132069]  [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[183593.139335]  [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[183593.146687]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[183593.153012]  [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[183593.159406]  [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[183593.166117]  [<ffffffff81018a08>] do_nmi+0x98/0xe0
[183593.171983]  [<ffffffff816e15ef>] end_repeat_nmi+0x1e/0x2e
[183593.178613]  [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.185823]  [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.193059]  [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.200306]  <<EOE>>  [<ffffffff816dea7b>] ? _raw_spin_unlock+0x2b/0x40
[183593.208107]  [<ffffffff81428ded>] __debug_object_init+0x16d/0x450
[183593.215328]  [<ffffffff8142919a>] ? debug_object_activate+0x8a/0x1e0
[183593.222812]  [<ffffffff814291ea>] ? debug_object_activate+0xda/0x1e0
[183593.230328]  [<ffffffff814290eb>] debug_object_init+0x1b/0x20
[183593.237143]  [<ffffffff810d8fb8>] rcuhead_fixup_activate+0x28/0x40
[183593.244413]  [<ffffffff81429201>] debug_object_activate+0xf1/0x1e0
[183593.251702]  [<ffffffff8115e650>] ? perf_unpin_context+0x40/0x40
[183593.258799]  [<ffffffff810dc8cb>] __call_rcu.constprop.58+0x3b/0x340
[183593.266311]  [<ffffffff81170b57>] ? __static_key_slow_dec+0x27/0xc0
[183593.273694]  [<ffffffff810dcc27>] call_rcu_sched+0x17/0x20
[183593.280180]  [<ffffffff81165439>] _free_event+0x109/0x290
[183593.286619]  [<ffffffff811656cf>] put_event+0x10f/0x160
[183593.292855]  [<ffffffff811655f0>] ? put_event+0x30/0x160
[183593.299153]  [<ffffffff81165755>] perf_release+0x15/0x20
[183593.305442]  [<ffffffff811edeec>] __fput+0xdc/0x1f0
[183593.311296]  [<ffffffff811ee04e>] ____fput+0xe/0x10
[183593.317138]  [<ffffffff81091577>] task_work_run+0xb7/0xf0
[183593.323509]  [<ffffffff81014189>] do_notify_resume+0x59/0x80
[183593.330207]  [<ffffffff816df550>] int_signal+0x12/0x17
[183602.817462] ------------[ cut here ]------------

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:perf/urgent] perf: Fix AUX buffer refcounting
  2015-06-18 10:32         ` Peter Zijlstra
  2015-06-18 13:11           ` Alexander Shishkin
  2015-06-19 14:21           ` Vince Weaver
@ 2015-07-06 15:34           ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-07-06 15:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: eranian, tglx, mingo, torvalds, hpa, vincent.weaver,
	linux-kernel, peterz, alexander.shishkin, acme

Commit-ID:  57ffc5ca679f499f4704fd9b6a372916f59930ee
Gitweb:     http://git.kernel.org/tip/57ffc5ca679f499f4704fd9b6a372916f59930ee
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 18 Jun 2015 12:32:49 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 6 Jul 2015 14:08:30 +0200

perf: Fix AUX buffer refcounting

Its currently possible to drop the last refcount to the aux buffer
from NMI context, which results in the expected fireworks.

The refcounting needs a bigger overhaul, but to cure the immediate
problem, delay the freeing by using an irq_work.

Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c        |  8 --------
 kernel/events/internal.h    | 10 ++++++++++
 kernel/events/ring_buffer.c | 27 +++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e965cfa..d3dae34 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4358,14 +4358,6 @@ static void ring_buffer_wakeup(struct perf_event *event)
 	rcu_read_unlock();
 }
 
-static void rb_free_rcu(struct rcu_head *rcu_head)
-{
-	struct ring_buffer *rb;
-
-	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
-	rb_free(rb);
-}
-
 struct ring_buffer *ring_buffer_get(struct perf_event *event)
 {
 	struct ring_buffer *rb;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2deb24c..2bbad9c 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,6 +11,7 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
+	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
@@ -55,6 +56,15 @@ struct ring_buffer {
 };
 
 extern void rb_free(struct ring_buffer *rb);
+
+static inline void rb_free_rcu(struct rcu_head *rcu_head)
+{
+	struct ring_buffer *rb;
+
+	rb = container_of(rcu_head, struct ring_buffer, rcu_head);
+	rb_free(rb);
+}
+
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
 extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 9647282..b2be01b 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,6 +221,8 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
+static void rb_irq_work(struct irq_work *work);
+
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -241,6 +243,16 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
+	init_irq_work(&rb->irq_work, rb_irq_work);
+}
+
+static void ring_buffer_put_async(struct ring_buffer *rb)
+{
+	if (!atomic_dec_and_test(&rb->refcount))
+		return;
+
+	rb->rcu_head.next = (void *)rb;
+	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -319,7 +331,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put(rb);
+	ring_buffer_put_async(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -370,7 +382,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put(rb);
+	ring_buffer_put_async(rb);
 }
 
 /*
@@ -557,7 +569,18 @@ static void __rb_free_aux(struct ring_buffer *rb)
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
+		irq_work_queue(&rb->irq_work);
+}
+
+static void rb_irq_work(struct irq_work *work)
+{
+	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
+
+	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
+
+	if (rb->rcu_head.next == (void *)rb)
+		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-06 15:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-12  3:15 perf: aux area related crash and warnings Vince Weaver
2015-06-12 18:42 ` Vince Weaver
2015-06-15 12:20   ` Peter Zijlstra
2015-06-15 12:49     ` Alexander Shishkin
2015-06-15 20:24       ` Ingo Molnar
2015-06-16 11:37       ` Alexander Shishkin
2015-06-18  9:09         ` Ingo Molnar
2015-06-18  9:35           ` Alexander Shishkin
2015-06-18 10:24             ` Peter Zijlstra
2015-06-18 16:30           ` Paul E. McKenney
2015-06-18 10:32         ` Peter Zijlstra
2015-06-18 13:11           ` Alexander Shishkin
2015-06-19 14:21           ` Vince Weaver
2015-07-06 15:34           ` [tip:perf/urgent] perf: Fix AUX buffer refcounting tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.