linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf/tracepoint yet another lockup
@ 2013-11-11 17:03 Vince Weaver
  2013-11-11 22:28 ` Vince Weaver
  0 siblings, 1 reply; 4+ messages in thread
From: Vince Weaver @ 2013-11-11 17:03 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Frederic Weisbecker, Vince Weaver,
	Steven Rostedt, Dave Jones

Hello

so I modified my perf_fuzzer to not try tracepoint event 0x18 
(irq_work_exit which is the event causing problems in the other thread).

But I got another lockup, presumbaly with a different trace event.  
I'll try to narrow down which one caused it.

Should we just be disabling the perf/tracepoint interface if not run as 
root at this point?

Vince


[243573.968001] ------------[ cut here ]------------
[243573.968001] WARNING: CPU: 0 PID: 4159 at kernel/watchdog.c:245 watchdog_ove)
[243573.968001] Watchdog detected hard LOCKUP on cpu 0
[243573.968001] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powen
[243573.968001] CPU: 0 PID: 4159 Comm: perf_fuzzer Tainted: G        W    3.12.2
[243573.968001] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, B2
[243573.968001]  00000000000000f5 ffff88011fc07c48 ffffffff8151aad6 ffff88011fc8
[243573.968001]  ffffffff8103cd68 0000000100000000 ffffffff810a101c ffff88011b30
[243573.968001]  ffff88011fc07ef8 ffff88011fc07d58 0000000000000000 ffff88011fc8
[243573.968001] Call Trace:
[243573.968001]  <NMI>  [<ffffffff8151aad6>] ? dump_stack+0x41/0x53
[243573.968001]  [<ffffffff8103cd68>] ? warn_slowpath_common+0x81/0x9c
[243573.968001]  [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6
[243573.968001]  [<ffffffff8103ce25>] ? warn_slowpath_fmt+0x46/0x48
[243573.968001]  [<ffffffff810c9ced>] ? perf_prepare_sample+0x62/0x2f4
[243573.968001]  [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6
[243573.968001]  [<ffffffff810ca4a2>] ? __perf_event_overflow+0x12e/0x1ab
[243573.968001]  [<ffffffff810167d3>] ? intel_pmu_lbr_read+0x253/0x349
[243573.968001]  [<ffffffff81018c95>] ? intel_pmu_handle_irq+0x2b8/0x34d
[243573.968001]  [<ffffffff810c79e3>] ? perf_event_context_sched_in+0x70/0x8d
[243573.968001]  [<ffffffff8151edfc>] ? nmi_handle+0xa8/0x13a
[243573.968001]  [<ffffffff8151f473>] ? perf_event_nmi_handler+0x24/0x3d
[243573.968001]  [<ffffffff8151edb2>] ? nmi_handle+0x5e/0x13a
[243573.968001]  [<ffffffff8151ef35>] ? do_nmi+0xa7/0x2bd
[243573.968001]  [<ffffffff8151e537>] ? end_repeat_nmi+0x1e/0x2e
[243573.968001]  [<ffffffff81526160>] ? irq_work_interrupt+0x70/0x70
[243573.968001]  [<ffffffff81526160>] ? irq_work_interrupt+0x70/0x70
[243573.968001]  [<ffffffff81526160>] ? irq_work_interrupt+0x70/0x70
[243573.968001]  <<EOE>>  [<ffffffff8106658c>] ? finish_task_switch+0x57/0x98
[243573.968001]  [<ffffffff8106657b>] ? finish_task_switch+0x46/0x98
[243573.968001]  [<ffffffff8151c9ee>] ? __schedule+0x4f7/0x54b
[243573.968001]  [<ffffffff8151ab1e>] ? schedule_timeout+0x36/0xe5
[243573.968001]  [<ffffffff8106657b>] ? finish_task_switch+0x46/0x98
[243573.968001]  [<ffffffff8151ca13>] ? __schedule+0x51c/0x54b
[243573.968001]  [<ffffffff8151aae8>] ? dump_stack+0x53/0x53
[243573.968001]  [<ffffffff8151cb96>] ? __wait_for_common+0xce/0x14a
[243573.968001]  [<ffffffff810664d6>] ? try_to_wake_up+0x1ab/0x1ab
[243573.968001]  [<ffffffff810a2fff>] ? T.901+0x1e4/0x1e4
[243573.968001]  [<ffffffff810561db>] ? wait_rcu_gp+0x4b/0x54
[243573.968001]  [<ffffffff810561e4>] ? wait_rcu_gp+0x54/0x54
[243573.968001]  [<ffffffff810be94a>] ? perf_trace_event_unreg+0x2e/0xbd
[243573.968001]  [<ffffffff810bea07>] ? perf_trace_destroy+0x2e/0x3b
[243573.968001]  [<ffffffff810c6f9e>] ? __free_event+0x2d/0x52
[243573.968001]  [<ffffffff810c8116>] ? perf_event_release_kernel+0x74/0x7b
[243573.968001]  [<ffffffff810c8358>] ? perf_release+0x10/0x14
[243573.968001]  [<ffffffff811133c5>] ? __fput+0xfa/0x1be
[243573.968001]  [<ffffffff81055ebd>] ? task_work_run+0x88/0xa0
[243573.968001]  [<ffffffff81524ea0>] ? int_signal+0x12/0x17
[243573.968001] ---[ end trace 199b386ed4ca91db ]---
[243573.968001] perf samples too long (7436949 > 10000), lowering kernel.perf_e0
[243573.968001] INFO: NMI handler (perf_event_nmi_handler) took too long to runs
[243589.024001] perf samples too long (7378854 > 20000), lowering kernel.perf_e0
[243589.844001] perf samples too long (7321212 > 40000), lowering kernel.perf_e0
[243590.692001] perf samples too long (7264020 > 76923), lowering kernel.perf_e0
[243591.540001] perf samples too long (7207276 > 142857), lowering kernel.perf_0
[243592.388001] perf samples too long (7150974 > 250000), lowering kernel.perf_0
[243593.236001] perf samples too long (7095112 > 500000), lowering kernel.perf_0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf/tracepoint yet another lockup
  2013-11-11 17:03 perf/tracepoint yet another lockup Vince Weaver
@ 2013-11-11 22:28 ` Vince Weaver
  2013-11-12  9:24   ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Vince Weaver @ 2013-11-11 22:28 UTC (permalink / raw)
  To: Vince Weaver
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Frederic Weisbecker,
	Steven Rostedt, Dave Jones


OK, managed to capture a crash, not sure if it was the same as the 
previous one in the thread.

In this case it looks like it is tracepoint event 0x25 which I think is
  vsyscall/emulate_vsyscall/id:37
Although I can't get this to trigger in a standalone workload (it seems to 
trigger best under some sort of load).

So are all tracepoints unsafe?  Or just a subset?

[18367.468004] ------------[ cut here ]------------
[18367.468004] WARNING: CPU: 0 PID: 17541 at kernel/watchdog.c:245 watchdog_ove)
[18367.468004] Watchdog detected hard LOCKUP on cpu 0
[18367.468004] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powern
[18367.468004] CPU: 0 PID: 17541 Comm: perf_fuzzer Not tainted 3.12.0 #2
[18367.468004] Hardware name: AOpen   DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[18367.468004]  00000000000000f5 ffff88011fc07c48 ffffffff8151aad6 ffff88011fc08
[18367.468004]  ffffffff8103cd68 ffff88011fc07c78 ffffffff810a101c ffff88011b310
[18367.468004]  ffff88011fc07ef8 ffff88011fc07d58 0000000000000000 ffff88011fc08
[18367.468004] Call Trace:
[18367.468004]  <NMI>  [<ffffffff8151aad6>] ? dump_stack+0x41/0x53
[18367.468004]  [<ffffffff8103cd68>] ? warn_slowpath_common+0x81/0x9c
[18367.468004]  [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6
[18367.468004]  [<ffffffff8103ce25>] ? warn_slowpath_fmt+0x46/0x48
[18367.468004]  [<ffffffff810c9df2>] ? perf_prepare_sample+0x167/0x2f4
[18367.468004]  [<ffffffff810a101c>] ? watchdog_overflow_callback+0x9b/0xa6
[18367.468004]  [<ffffffff810ca4a2>] ? __perf_event_overflow+0x12e/0x1ab
[18367.468004]  [<ffffffff810167d3>] ? intel_pmu_lbr_read+0x253/0x349
[18367.468004]  [<ffffffff81018c95>] ? intel_pmu_handle_irq+0x2b8/0x34d
[18367.468004]  [<ffffffff8151dc88>] ? _raw_spin_lock+0x12/0x2a
[18367.468004]  [<ffffffff8151edfc>] ? nmi_handle+0xa8/0x13a
[18367.468004]  [<ffffffff8151f473>] ? perf_event_nmi_handler+0x24/0x3d
[18367.468004]  [<ffffffff8151edb2>] ? nmi_handle+0x5e/0x13a
[18367.468004]  [<ffffffff8151ef35>] ? do_nmi+0xa7/0x2bd
[18367.468004]  [<ffffffff8151e537>] ? end_repeat_nmi+0x1e/0x2e
[18367.468004]  [<ffffffff81049a6f>] ? __lock_task_sighand+0x4a/0x6a
[18367.468004]  [<ffffffff81049a6f>] ? __lock_task_sighand+0x4a/0x6a
[18367.468004]  [<ffffffff81049a6f>] ? __lock_task_sighand+0x4a/0x6a
[18367.468004]  <<EOE>>  <IRQ>  [<ffffffff8104b055>] ? do_send_sig_info+0x2c/0x5
[18367.468004]  [<ffffffff8111f683>] ? send_sigio_to_task+0xeb/0xf5
[18367.468004]  [<ffffffff8111f737>] ? send_sigio+0xaa/0xdd
[18367.468004]  [<ffffffff8111f7d3>] ? kill_fasync+0x69/0x86
[18367.468004]  [<ffffffff810cabf9>] ? perf_event_wakeup+0x8c/0xa1
[18367.468004]  [<ffffffff810c494d>] ? __irq_work_run+0x5f/0x7b
[18367.468004]  [<ffffffff81005f82>] ? smp_trace_irq_work_interrupt+0x4b/0x7e
[18367.468004]  [<ffffffff815261ca>] ? trace_irq_work_interrupt+0x6a/0x70
[18367.468004]  [<ffffffff8151da07>] ? _raw_spin_unlock_irqrestore+0xf/0x11
[18367.468004]  [<ffffffff810a3a89>] ? rcu_process_callbacks+0x4a4/0x4b8
[18367.468004]  [<ffffffff8105b8f3>] ? enqueue_hrtimer+0x51/0x7b
[18367.468004]  [<ffffffff81041027>] ? __do_softirq+0x102/0x208
[18367.468004]  [<ffffffff81027608>] ? lapic_next_event+0x1d/0x21
[18367.468004]  [<ffffffff8105c187>] ? hrtimer_interrupt+0x119/0x1c6
[18367.468004]  [<ffffffff815263dc>] ? call_softirq+0x1c/0x30
[18367.468004]  [<ffffffff810041b9>] ? do_softirq+0x38/0x6d
[18367.468004]  [<ffffffff81040d1c>] ? irq_exit+0x45/0xa1
[18367.468004]  [<ffffffff81526a14>] ? smp_trace_apic_timer_interrupt+0x83/0x91
[18367.468004]  [<ffffffff8152597a>] ? trace_apic_timer_interrupt+0x6a/0x70
[18367.468004]  <EOI>  [<ffffffff8151dc88>] ? _raw_spin_lock+0x12/0x2a
[18367.468004]  [<ffffffff81133680>] ? sync_inodes_sb+0x10a/0x1a0
[18367.468004]  [<ffffffff811373d3>] ? fdatawrite_one_bdev+0x17/0x17
[18367.468004]  [<ffffffff8111446f>] ? iterate_supers+0x6e/0xc0
[18367.468004]  [<ffffffff8113741f>] ? sys_sync+0x35/0x83
[18367.468004]  [<ffffffff81524c96>] ? system_call_fastpath+0x1a/0x1f
[18367.468004] ---[ end trace 73e6e6b3ff98085b ]---
[18367.468004] perf samples too long (8137810 > 20000), lowering kernel.perf_ev0
[18367.468004] INFO: NMI handler (perf_event_nmi_handler) took too long to run:s
[18378.460001] perf samples too long (8074239 > 40000), lowering kernel.perf_ev0
[18379.284001] perf samples too long (8011165 > 76923), lowering kernel.perf_ev0
[18380.132001] perf samples too long (7948583 > 142857), lowering kernel.perf_e0
[18380.980001] perf samples too long (7886490 > 250000), lowering kernel.perf_e0
[18381.828001] perf samples too long (7824882 > 500000), lowering kernel.perf_e0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf/tracepoint yet another lockup
  2013-11-11 22:28 ` Vince Weaver
@ 2013-11-12  9:24   ` Peter Zijlstra
  2013-11-13 22:39     ` Vince Weaver
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2013-11-12  9:24 UTC (permalink / raw)
  To: Vince Weaver
  Cc: LKML, Ingo Molnar, Frederic Weisbecker, Steven Rostedt, Dave Jones

On Mon, Nov 11, 2013 at 05:28:21PM -0500, Vince Weaver wrote:
> So are all tracepoints unsafe?  Or just a subset?

No, most should be fine. But finding the special ones seems interesting
:/

Disabling them all for now sure seems like the safe option.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf/tracepoint yet another lockup
  2013-11-12  9:24   ` Peter Zijlstra
@ 2013-11-13 22:39     ` Vince Weaver
  0 siblings, 0 replies; 4+ messages in thread
From: Vince Weaver @ 2013-11-13 22:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Frederic Weisbecker, Steven Rostedt, Dave Jones

On Tue, 12 Nov 2013, Peter Zijlstra wrote:

> On Mon, Nov 11, 2013 at 05:28:21PM -0500, Vince Weaver wrote:
> > So are all tracepoints unsafe?  Or just a subset?
> 
> No, most should be fine. But finding the special ones seems interesting
> :/
> 
> Disabling them all for now sure seems like the safe option.

Update on this, it was a false positive.  It seems like the other crashes 
I was seeing were also caused by tracepoint 24, the fuzzer just found them 
via another code path I wasn't checking.

So from what I can tell the only troublesome tracepoint I've come across 
is the original irq_vectors/irq_work_exit problem.

Vince


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-13 22:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-11 17:03 perf/tracepoint yet another lockup Vince Weaver
2013-11-11 22:28 ` Vince Weaver
2013-11-12  9:24   ` Peter Zijlstra
2013-11-13 22:39     ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).