perf: another perf_fuzzer generated lockup

* perf: another perf_fuzzer generated lockup
@ 2015-05-08  4:40 Vince Weaver
  2015-05-09  0:04 ` Stephane Eranian
  2015-05-27 12:52 ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Vince Weaver @ 2015-05-08  4:40 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo,
	Jiri Olsa, Ingo Molnar, Paul Mackerras

This is a new one I think, I hit it on the haswell machine running 
4.1-rc2.

The backtrace is complex enough I'm not really sure what's going on here.

The fuzzer has been having weird issues where it's been getting 
overflow signals from invalid fds.  This seems to happen
when an overflow signal interrupts the fuzzer mid-fork?
And the fuzzer code doesn't handle this well and attempts to call exit()
and/or kill the child from the signal handler that interrupted the 
fork() and that doesn't always go well.  I'm not sure if this is related, 
just that some of those actions seem to appear in the backtrace.

[33864.529861] ------------[ cut here ]------------
[33864.534824] WARNING: CPU: 1 PID: 9852 at kernel/watchdog.c:302 watchdog_overflow_callback+0x92/0xc0()
[33864.544682] Watchdog detected hard LOCKUP on cpu 1
[33864.549635] Modules linked in:
[33864.552943]  fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core crct10dif_pclmul snd_hwdep crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel aes_x86_64 lrw gf128mul evdev i915 iTCO_wdt iTCO_vendor_support glue_helper snd_timer ppdev ablk_helper psmouse drm_kms_helper cryptd snd drm pcspkr serio_raw lpc_ich soundcore parport_pc xhci_pci battery video processor i2c_i801 mei_me mei wmi i2c_algo_bit tpm_tis mfd_core xhci_hcd tpm parport button sg sr_mod sd_mod cdrom ehci_pci ahci ehci_hcd libahci libata e1000e ptp usbcore crc32c_intel scsi_mod fan usb_common pps_core thermal thermal_sys
[33864.622413] CPU: 1 PID: 9852 Comm: perf_fuzzer Not tainted 4.1.0-rc2+ #142
[33864.629776] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[33864.637685]  ffffffff81a209b5 ffff88011ea45aa0 ffffffff816d51d3 0000000000000000
[33864.645709]  ffff88011ea45af0 ffff88011ea45ae0 ffffffff81072dfa ffff88011ea45ac0
[33864.653731]  ffff880119b8f800 0000000000000000 ffff88011ea45c40 ffff88011ea45ef8
[33864.661783] Call Trace:
[33864.664409]  <NMI>  [<ffffffff816d51d3>] dump_stack+0x45/0x57
[33864.670618]  [<ffffffff81072dfa>] warn_slowpath_common+0x8a/0xc0
[33864.677071]  [<ffffffff81072e76>] warn_slowpath_fmt+0x46/0x50
[33864.683202]  [<ffffffff8104db64>] ? native_apic_wait_icr_idle+0x24/0x30
[33864.690280]  [<ffffffff81124f22>] watchdog_overflow_callback+0x92/0xc0
[33864.697294]  [<ffffffff811673b1>] __perf_event_overflow+0x91/0x270
[33864.703916]  [<ffffffff811673f9>] ? __perf_event_overflow+0xd9/0x270
[33864.710696]  [<ffffffff8102b60a>] ? x86_perf_event_set_period+0xda/0x180
[33864.717842]  [<ffffffff81167f89>] perf_event_overflow+0x19/0x20
[33864.724195]  [<ffffffff810338f2>] intel_pmu_handle_irq+0x1e2/0x450
[33864.730840]  [<ffffffff8102a41b>] perf_event_nmi_handler+0x2b/0x50
[33864.737436]  [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[33864.743025]  [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[33864.748733]  [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[33864.754705]  [<ffffffff81018a08>] do_nmi+0x98/0xe0
[33864.759858]  [<ffffffff816df42f>] end_repeat_nmi+0x1e/0x2e
[33864.765746]  [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
[33864.772004]  [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
[33864.778253]  [<ffffffff810bc1bb>] ? check_chain_key+0xdb/0x1e0
[33864.784498]  <<EOE>>  <IRQ>  [<ffffffff810bdd09>] __lock_acquire.isra.31+0x3b9/0x1000
[33864.792950]  [<ffffffff810bdd09>] ? __lock_acquire.isra.31+0x3b9/0x1000
[33864.800045]  [<ffffffff810bf2e5>] lock_acquire+0xa5/0x130
[33864.805817]  [<ffffffff81080afe>] ? __lock_task_sighand+0x6e/0x110
[33864.812468]  [<ffffffff81080aaa>] ? __lock_task_sighand+0x1a/0x110
[33864.819084]  [<ffffffff816dc721>] _raw_spin_lock+0x31/0x40
[33864.824979]  [<ffffffff81080afe>] ? __lock_task_sighand+0x6e/0x110
[33864.831623]  [<ffffffff81080afe>] __lock_task_sighand+0x6e/0x110
[33864.838096]  [<ffffffff81080aaa>] ? __lock_task_sighand+0x1a/0x110
[33864.845314]  [<ffffffff81080bcc>] do_send_sig_info+0x2c/0x80
[33864.851949]  [<ffffffff81167807>] ? perf_swevent_event+0x67/0x90
[33864.858980]  [<ffffffff8120092f>] send_sigio_to_task+0x12f/0x1a0
[33864.866005]  [<ffffffff81200805>] ? send_sigio_to_task+0x5/0x1a0
[33864.873047]  [<ffffffff812010c6>] ? send_sigio+0x56/0x100
[33864.879411]  [<ffffffff8120111e>] send_sigio+0xae/0x100
[33864.885564]  [<ffffffff81201207>] kill_fasync+0x97/0xf0
[33864.891713]  [<ffffffff8120117f>] ? kill_fasync+0xf/0xf0
[33864.897983]  [<ffffffff811667d4>] perf_event_wakeup+0xd4/0xf0
[33864.904662]  [<ffffffff81166705>] ? perf_event_wakeup+0x5/0xf0
[33864.911490]  [<ffffffff811668d0>] ? perf_pending_event+0xe0/0x110
[33864.918580]  [<ffffffff811668d0>] perf_pending_event+0xe0/0x110
[33864.925494]  [<ffffffff81156eac>] irq_work_run_list+0x4c/0x80
[33864.932197]  [<ffffffff81156ef8>] irq_work_run+0x18/0x40
[33864.938469]  [<ffffffff8101953f>] smp_trace_irq_work_interrupt+0x3f/0xc0
[33864.946263]  [<ffffffff816dea3e>] trace_irq_work_interrupt+0x6e/0x80
[33864.953646]  <EOI>  [<ffffffff811a1b47>] ? copy_page_range+0x527/0x9a0
[33864.961287]  [<ffffffff811a1b22>] ? copy_page_range+0x502/0x9a0
[33864.968265]  [<ffffffff810710d2>] copy_process.part.23+0xc92/0x1b80
[33864.975589]  [<ffffffff810813be>] ? SYSC_kill+0x8e/0x230
[33864.981879]  [<ffffffff81072178>] do_fork+0xd8/0x420
[33864.987807]  [<ffffffff812007e3>] ? f_setown+0x83/0xa0
[33864.993953]  [<ffffffff81200d30>] ? SyS_fcntl+0x310/0x650
[33865.000348]  [<ffffffff81417a23>] ? lockdep_sys_exit_thunk+0x12/0x14
[33865.007781]  [<ffffffff81072546>] SyS_clone+0x16/0x20
[33865.013830]  [<ffffffff816dd1b2>] system_call_fastpath+0x16/0x7a
[33865.020843] ---[ end trace d3bd7d73656f3cba ]---
[33865.026418] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 496.487 msecs
[33865.035874] perf interrupt took too long (3879951 > 5000), lowering kernel.perf_event_max_sample_rate to 25000

^ permalink raw reply	[flat|nested] 5+ messages in thread