linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* trinity finds ftrace/perf bug. Film at 11.
@ 2013-09-11 13:54 Dave Jones
  2013-09-12 18:19 ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-11 13:54 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Linux Kernel

Usual story. I'm trying to narrow down on a tracing bug I'd reported that we had no
understanding of (https://lkml.org/lkml/2013/8/27/646) and had hacked up
trinity to just only use perf fds, and left it running overnight.

Woke up to _another_ bug.

WARNING: CPU: 1 PID: 23361 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
Modules linked in: bnep can_bcm caif_socket caif phonet af_rxrpc bluetooth llc2 af_key rose netrom bridge stp snd_seq_dummy tun fuse scsi_transport_iscsi ipt_ULOG pppoe pppox ppp_generic slhc can_raw can af_802154 nfnetlink nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e pcspkr usb_debug ptp pps_core soundcore
CPU: 1 PID: 23361 Comm: trinity-child31 Not tainted 3.11.0+ #66
 ffffffff81a2a90b ffff8801fe4e9a68 ffffffff8171d32b 0000000000000000
 ffff8801fe4e9aa0 ffffffff81053e5d ffff880102728948 ffff8802459cf3e0
 0000000000000008 0000000000000001 000000002fa8ce99 ffff8801fe4e9ab0
Call Trace:
 [<ffffffff8171d32b>] dump_stack+0x54/0x74
 [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81142fbd>] perf_swevent_add+0x18d/0x1a0
 [<ffffffff81143b37>] event_sched_in.isra.78+0x87/0x1c0
 [<ffffffff81144a2a>] group_sched_in+0x6a/0x1c0
 [<ffffffff81145721>] ctx_sched_in+0x101/0x290
 [<ffffffff81145910>] perf_event_sched_in+0x60/0x90
 [<ffffffff8114939b>] perf_event_context_sched_in+0x7b/0xc0
 [<ffffffff81149ef7>] __perf_event_task_sched_in+0x477/0x490
 [<ffffffff8109a8e4>] ? arch_vtime_task_switch+0x94/0xa0
 [<ffffffff8108e840>] finish_task_switch+0xf0/0x180
 [<ffffffff817242dc>] __schedule+0x34c/0x9b0
 [<ffffffff81724969>] schedule+0x29/0x70
 [<ffffffff81720e41>] schedule_timeout+0x1c1/0x360
 [<ffffffff810be89f>] ? trace_hardirqs_off_caller+0x1f/0xc0
 [<ffffffff810be94d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81099e2f>] ? local_clock+0x3f/0x50
 [<ffffffff81726cfc>] ? _raw_spin_unlock_irq+0x2c/0x40
 [<ffffffff810c2065>] ? trace_hardirqs_on_caller+0x115/0x1e0
 [<ffffffff81724e64>] wait_for_completion+0xd4/0x110
 [<ffffffff810965a0>] ? wake_up_state+0x20/0x20
 [<ffffffff81088acd>] __synchronize_srcu+0x12d/0x1f0
 [<ffffffff81088580>] ? call_srcu+0x80/0x80
 [<ffffffff8108e52d>] ? complete+0x1d/0x50
 [<ffffffff81088bad>] synchronize_srcu+0x1d/0x20
 [<ffffffff812049ae>] fsnotify_destroy_group+0x1e/0x40
 [<ffffffff81207868>] SyS_inotify_init1+0xe8/0x110
 [<ffffffff812078a0>] sys_inotify_init+0x10/0x20
 [<ffffffff81730b54>] tracesys+0xdd/0xe2
---[ end trace 2e7ee66c64149375 ]---

That's this in perf_swevent_add

5564 
5565         head = find_swevent_head(swhash, event);
5566         if (WARN_ON_ONCE(!head))
5567                 return -EINVAL;
5568 
5569         hlist_add_head_rcu(&event->hlist_entry, head);
5570 
5571         return 0;
5572 }
5573 

Of the 8 syscalls that trinity was running, three were perf_event_open
inotify_init, and newlstat.  

So then I tried running trinity with just those three syscalls.

And got the same WARN_ON, but from a slightly different path..

WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
 ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
 ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
 0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
Call Trace:
 [<ffffffff8171d5cb>] dump_stack+0x54/0x74
 [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
 [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
 [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
 [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
 [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
 [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
 [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
 [<ffffffff8109a954>] ? arch_vtime_task_switch+0x94/0xa0
 [<ffffffff8108e860>] finish_task_switch+0xf0/0x180
 [<ffffffff81724564>] __schedule+0x344/0xa20
 [<ffffffff81093326>] __cond_resched+0x26/0x30
 [<ffffffff8172507a>] _cond_resched+0x3a/0x50
 [<ffffffff811d7fbe>] clear_inode+0x2e/0x80
 [<ffffffff8122bf33>] proc_evict_inode+0x23/0x70
 [<ffffffff811d8f13>] evict+0xa3/0x1a0
 [<ffffffff811d98b5>] iput+0xf5/0x190
 [<ffffffff811d3c55>] dput+0x245/0x260
 [<ffffffff811c6586>] path_put+0x16/0x30
 [<ffffffff811c138d>] vfs_fstatat+0x6d/0xa0
 [<ffffffff811c1822>] SYSC_newlstat+0x22/0x40
 [<ffffffff811c1a3e>] SyS_newlstat+0xe/0x10
 [<ffffffff81730e54>] tracesys+0xdd/0xe2

The good news is I can reproduce that very quickly.
(Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
 and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)


(There's a different kernel build on the two traces fwiw, but the commits are
just 8d7551eb1916832f2a5b27346edf24e7b2382f67 -> bf83e61464803d386d0ec3fc92e5449d7963a409,
which is just a bunch of drm stuff).

	Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-11 13:54 trinity finds ftrace/perf bug. Film at 11 Dave Jones
@ 2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Steven Rostedt @ 2013-09-12 18:19 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, Peter Zijlstra, Frederic Weisbecker, Ingo Molnar


This is something that either Peter or Frederic need to look at.

-- Steve


On Wed, 11 Sep 2013 09:54:34 -0400
Dave Jones <davej@redhat.com> wrote:

> Usual story. I'm trying to narrow down on a tracing bug I'd reported that we had no
> understanding of (https://lkml.org/lkml/2013/8/27/646) and had hacked up
> trinity to just only use perf fds, and left it running overnight.
> 
> Woke up to _another_ bug.
> 
> WARNING: CPU: 1 PID: 23361 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> Modules linked in: bnep can_bcm caif_socket caif phonet af_rxrpc bluetooth llc2 af_key rose netrom bridge stp snd_seq_dummy tun fuse scsi_transport_iscsi ipt_ULOG pppoe pppox ppp_generic slhc can_raw can af_802154 nfnetlink nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e pcspkr usb_debug ptp pps_core soundcore
> CPU: 1 PID: 23361 Comm: trinity-child31 Not tainted 3.11.0+ #66
>  ffffffff81a2a90b ffff8801fe4e9a68 ffffffff8171d32b 0000000000000000
>  ffff8801fe4e9aa0 ffffffff81053e5d ffff880102728948 ffff8802459cf3e0
>  0000000000000008 0000000000000001 000000002fa8ce99 ffff8801fe4e9ab0
> Call Trace:
>  [<ffffffff8171d32b>] dump_stack+0x54/0x74
>  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff81142fbd>] perf_swevent_add+0x18d/0x1a0
>  [<ffffffff81143b37>] event_sched_in.isra.78+0x87/0x1c0
>  [<ffffffff81144a2a>] group_sched_in+0x6a/0x1c0
>  [<ffffffff81145721>] ctx_sched_in+0x101/0x290
>  [<ffffffff81145910>] perf_event_sched_in+0x60/0x90
>  [<ffffffff8114939b>] perf_event_context_sched_in+0x7b/0xc0
>  [<ffffffff81149ef7>] __perf_event_task_sched_in+0x477/0x490
>  [<ffffffff8109a8e4>] ? arch_vtime_task_switch+0x94/0xa0
>  [<ffffffff8108e840>] finish_task_switch+0xf0/0x180
>  [<ffffffff817242dc>] __schedule+0x34c/0x9b0
>  [<ffffffff81724969>] schedule+0x29/0x70
>  [<ffffffff81720e41>] schedule_timeout+0x1c1/0x360
>  [<ffffffff810be89f>] ? trace_hardirqs_off_caller+0x1f/0xc0
>  [<ffffffff810be94d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff81099e2f>] ? local_clock+0x3f/0x50
>  [<ffffffff81726cfc>] ? _raw_spin_unlock_irq+0x2c/0x40
>  [<ffffffff810c2065>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff81724e64>] wait_for_completion+0xd4/0x110
>  [<ffffffff810965a0>] ? wake_up_state+0x20/0x20
>  [<ffffffff81088acd>] __synchronize_srcu+0x12d/0x1f0
>  [<ffffffff81088580>] ? call_srcu+0x80/0x80
>  [<ffffffff8108e52d>] ? complete+0x1d/0x50
>  [<ffffffff81088bad>] synchronize_srcu+0x1d/0x20
>  [<ffffffff812049ae>] fsnotify_destroy_group+0x1e/0x40
>  [<ffffffff81207868>] SyS_inotify_init1+0xe8/0x110
>  [<ffffffff812078a0>] sys_inotify_init+0x10/0x20
>  [<ffffffff81730b54>] tracesys+0xdd/0xe2
> ---[ end trace 2e7ee66c64149375 ]---
> 
> That's this in perf_swevent_add
> 
> 5564 
> 5565         head = find_swevent_head(swhash, event);
> 5566         if (WARN_ON_ONCE(!head))
> 5567                 return -EINVAL;
> 5568 
> 5569         hlist_add_head_rcu(&event->hlist_entry, head);
> 5570 
> 5571         return 0;
> 5572 }
> 5573 
> 
> Of the 8 syscalls that trinity was running, three were perf_event_open
> inotify_init, and newlstat.  
> 
> So then I tried running trinity with just those three syscalls.
> 
> And got the same WARN_ON, but from a slightly different path..
> 
> WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
> CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
>  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
>  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
>  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
> Call Trace:
>  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
>  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
>  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
>  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
>  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
>  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
>  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
>  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
>  [<ffffffff8109a954>] ? arch_vtime_task_switch+0x94/0xa0
>  [<ffffffff8108e860>] finish_task_switch+0xf0/0x180
>  [<ffffffff81724564>] __schedule+0x344/0xa20
>  [<ffffffff81093326>] __cond_resched+0x26/0x30
>  [<ffffffff8172507a>] _cond_resched+0x3a/0x50
>  [<ffffffff811d7fbe>] clear_inode+0x2e/0x80
>  [<ffffffff8122bf33>] proc_evict_inode+0x23/0x70
>  [<ffffffff811d8f13>] evict+0xa3/0x1a0
>  [<ffffffff811d98b5>] iput+0xf5/0x190
>  [<ffffffff811d3c55>] dput+0x245/0x260
>  [<ffffffff811c6586>] path_put+0x16/0x30
>  [<ffffffff811c138d>] vfs_fstatat+0x6d/0xa0
>  [<ffffffff811c1822>] SYSC_newlstat+0x22/0x40
>  [<ffffffff811c1a3e>] SyS_newlstat+0xe/0x10
>  [<ffffffff81730e54>] tracesys+0xdd/0xe2
> 
> The good news is I can reproduce that very quickly.
> (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
>  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
> 
> 
> (There's a different kernel build on the two traces fwiw, but the commits are
> just 8d7551eb1916832f2a5b27346edf24e7b2382f67 -> bf83e61464803d386d0ec3fc92e5449d7963a409,
> which is just a bunch of drm stuff).
> 
> 	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
@ 2013-09-12 18:38   ` Dave Jones
  2013-10-03 15:44     ` Peter Zijlstra
  2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 11:14   ` Peter Zijlstra
  2 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-12 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel, Peter Zijlstra, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > 
 > This is something that either Peter or Frederic need to look at.

I spent yesterday trying to bisect this, but kept running into other perf/ftrace traces,
which makes it pretty much impossible.

The current one I'm staring at is this from LIST_DEBUG..

list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).

The sharp eyed will notice that those first two addresses are actually the same.
So we matched the next->prev != prev test, but at the time we got to the printk that follows
in the WARN, the inverse was true. Smells like a race of some kind.

[ 2481.228418] Call Trace:
[ 2481.232705]  [<ffffffff8171f5dd>] dump_stack+0x54/0x74
[ 2481.237025]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
[ 2481.241357]  [<ffffffff8105412c>] warn_slowpath_fmt+0x4c/0x50
[ 2481.245637]  [<ffffffff810540e5>] ? warn_slowpath_fmt+0x5/0x50
[ 2481.249890]  [<ffffffff8132c222>] __list_del_entry+0x62/0xf0
[ 2481.254109]  [<ffffffff81143a88>] list_del_event+0xc8/0xe0
[ 2481.258310]  [<ffffffff81143b2b>] perf_remove_from_context+0x8b/0xf0
[ 2481.262497]  [<ffffffff81145444>] perf_event_release_kernel+0x54/0x90
[ 2481.266688]  [<ffffffff811464d3>] put_event+0x133/0x1c0
[ 2481.270866]  [<ffffffff811463d0>] ? put_event+0x30/0x1c0
[ 2481.275066]  [<ffffffff81146570>] perf_release+0x10/0x20
[ 2481.279277]  [<ffffffff811bf2ca>] __fput+0xfa/0x2e0
[ 2481.283488]  [<ffffffff811bf4fe>] ____fput+0xe/0x10
[ 2481.287713]  [<ffffffff8107c11c>] task_work_run+0xac/0xe0
[ 2481.291958]  [<ffffffff810559bc>] do_exit+0x2cc/0xcc0
[ 2481.296194]  [<ffffffff81732880>] ? ftrace_call+0x5/0x2f
[ 2481.300435]  [<ffffffff810577cc>] do_group_exit+0x4c/0xc0
[ 2481.304654]  [<ffffffff81057854>] SyS_exit_group+0x14/0x20
[ 2481.308857]  [<ffffffff81732e54>] tracesys+0xdd/0xe2
[ 2481.313051] ---[ end trace fa3e551eb20e0c0f ]---

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
@ 2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 14:16     ` Dave Jones
  2013-09-13 14:32     ` Steven Rostedt
  2013-09-13 11:14   ` Peter Zijlstra
  2 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 10:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:

> > The good news is I can reproduce that very quickly.
> > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
> >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)

" The paste you are looking for does not exist "

And that line on its own doesn't seem to do much.. its been running for
minutes now.

Or does 'quickly' mean something else?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
  2013-09-13 10:56   ` Peter Zijlstra
@ 2013-09-13 11:14   ` Peter Zijlstra
  2013-09-13 14:28     ` Dave Jones
  2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 11:14 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
> > WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> > Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
> > CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
> >  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
> >  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
> >  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
> > Call Trace:
> >  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
> >  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
> >  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
> >  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
> >  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
> >  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
> >  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
> >  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
> >  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490

So I've got an idea how this can happen. If we have a per-cpu swevent
and group it with an uncore counter which lives on another cpu we'll
migrate the swevent using perf_pmu_migrate_context() but it doesn't
migrate the swhash.

The below should be able to confirm that theory if one can reproduce the
issue.

---
 kernel/events/core.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2207efc..e1441f5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5621,11 +5621,6 @@ static void swevent_hlist_put(struct perf_event *event)
 {
 	int cpu;
 
-	if (event->cpu != -1) {
-		swevent_hlist_put_cpu(event, event->cpu);
-		return;
-	}
-
 	for_each_possible_cpu(cpu)
 		swevent_hlist_put_cpu(event, cpu);
 }
@@ -5659,9 +5654,6 @@ static int swevent_hlist_get(struct perf_event *event)
 	int err;
 	int cpu, failed_cpu;
 
-	if (event->cpu != -1)
-		return swevent_hlist_get_cpu(event, event->cpu);
-
 	get_online_cpus();
 	for_each_possible_cpu(cpu) {
 		err = swevent_hlist_get_cpu(event, cpu);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 10:56   ` Peter Zijlstra
@ 2013-09-13 14:16     ` Dave Jones
  2013-09-16  9:37       ` Peter Zijlstra
  2013-09-13 14:32     ` Steven Rostedt
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-13 14:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 12:56:37PM +0200, Peter Zijlstra wrote:
 > On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > 
 > > > The good news is I can reproduce that very quickly.
 > > > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
 > > >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
 > 
 > " The paste you are looking for does not exist "
 
bah. that sucks. appended below.

 > And that line on its own doesn't seem to do much.. its been running for
 > minutes now.
 > 
 > Or does 'quickly' mean something else?

without the diff, it'll use all kinds of fds. this diff hardwires it to just use
fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
way at some point.

	Dave


diff --git a/fds.c b/fds.c
index cc1ca6d..f48c95e 100644
--- a/fds.c
+++ b/fds.c
@@ -119,6 +119,8 @@ static int get_new_random_fd(void)
 retry:
 	i = rand() % 6;
 
+	i = 3;
+
 	if (do_specific_proto == TRUE)
 		i = 1;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 11:14   ` Peter Zijlstra
@ 2013-09-13 14:28     ` Dave Jones
  2013-09-13 14:58       ` Peter Zijlstra
  2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-13 14:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 01:14:47PM +0200, Peter Zijlstra wrote:
 > On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > > > WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
 > > > Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
 > > > CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
 > > >  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
 > > >  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
 > > >  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
 > > > Call Trace:
 > > >  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
 > > >  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 > > >  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 > > >  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
 > > >  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
 > > >  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
 > > >  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
 > > >  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
 > > >  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
 > > >  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
 > 
 > So I've got an idea how this can happen. If we have a per-cpu swevent
 > and group it with an uncore counter which lives on another cpu we'll
 > migrate the swevent using perf_pmu_migrate_context() but it doesn't
 > migrate the swhash.
 > 
 > The below should be able to confirm that theory if one can reproduce the
 > issue.

With that, I now get a different bug... yay?

[  212.012514] WARNING: CPU: 1 PID: 575 at kernel/events/core.c:1109 add_event_to_ctx+0x1fd/0x210()
[  212.016034] Modules linked in: pppoe pppox ppp_generic slhc ipt_ULOG can_raw can scsi_transport_iscsi nfnetlink af_802154 nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e usb_debug soundcore ptp pps_core pcspkr
[  212.026484] CPU: 1 PID: 575 Comm: trinity-child11 Not tainted 3.11.0+ #75 
[  212.031121]  ffffffff81a2b40e ffff88023087dcc0 ffffffff81720482 0000000000000000
[  212.033279]  ffff88023087dcf8 ffffffff810540bd ffff88024197e418 ffff88023976ae68
[  212.035471]  ffff88024197e418 000000007b016bc1 ffff8802419a9530 ffff88023087dd08
[  212.037549] Call Trace:
[  212.038277]  [<ffffffff81720482>] dump_stack+0x54/0x74
[  212.039604]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
[  212.040946]  [<ffffffff8105419a>] warn_slowpath_null+0x1a/0x20
[  212.042296]  [<ffffffff8114383d>] add_event_to_ctx+0x1fd/0x210
[  212.043650]  [<ffffffff81149a36>] __perf_install_in_context+0x136/0x260
[  212.045147]  [<ffffffff811434a0>] ? rcu_read_lock_held+0x50/0x50
[  212.046523]  [<ffffffff811434e0>] remote_function+0x40/0x50
[  212.047759]  [<ffffffff810cc670>] smp_call_function_single+0x190/0x1e0
[  212.048337]  [<ffffffff8107d969>] ? wait_rcu_gp+0x79/0xa0
[  212.048374]  [<ffffffff810cc4e5>] ? smp_call_function_single+0x5/0x1e0
[  212.048418]  [<ffffffff811423c4>] task_function_call+0x44/0x50
[  212.048458]  [<ffffffff81149900>] ? perf_cpu_hrtimer_handler+0x1f0/0x1f0
[  212.048502]  [<ffffffff811438d7>] perf_install_in_context+0x87/0x100
[  212.048546]  [<ffffffff8114eebc>] SYSC_perf_event_open+0xcec/0xea0
[  212.048590]  [<ffffffff8114f449>] SyS_perf_event_open+0x9/0x10
[  212.048629]  [<ffffffff81733d54>] tracesys+0xdd/0xe2

 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 14:16     ` Dave Jones
@ 2013-09-13 14:32     ` Steven Rostedt
  2013-09-13 14:58       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2013-09-13 14:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, 13 Sep 2013 12:56:37 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
> 
> > > The good news is I can reproduce that very quickly.
> > > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
> > >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
> 
> " The paste you are looking for does not exist "
> 

"These are not the pasties you are looking for"


-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:28     ` Dave Jones
@ 2013-09-13 14:58       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:58 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Fri, Sep 13, 2013 at 10:28:56AM -0400, Dave Jones wrote:
> With that, I now get a different bug... yay?
> 
> [  212.012514] WARNING: CPU: 1 PID: 575 at kernel/events/core.c:1109 add_event_to_ctx+0x1fd/0x210()
> [  212.016034] Modules linked in: pppoe pppox ppp_generic slhc ipt_ULOG can_raw can scsi_transport_iscsi nfnetlink af_802154 nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e usb_debug soundcore ptp pps_core pcspkr
> [  212.026484] CPU: 1 PID: 575 Comm: trinity-child11 Not tainted 3.11.0+ #75 
> [  212.031121]  ffffffff81a2b40e ffff88023087dcc0 ffffffff81720482 0000000000000000
> [  212.033279]  ffff88023087dcf8 ffffffff810540bd ffff88024197e418 ffff88023976ae68
> [  212.035471]  ffff88024197e418 000000007b016bc1 ffff8802419a9530 ffff88023087dd08
> [  212.037549] Call Trace:
> [  212.038277]  [<ffffffff81720482>] dump_stack+0x54/0x74
> [  212.039604]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
> [  212.040946]  [<ffffffff8105419a>] warn_slowpath_null+0x1a/0x20
> [  212.042296]  [<ffffffff8114383d>] add_event_to_ctx+0x1fd/0x210
> [  212.043650]  [<ffffffff81149a36>] __perf_install_in_context+0x136/0x260
> [  212.045147]  [<ffffffff811434a0>] ? rcu_read_lock_held+0x50/0x50
> [  212.046523]  [<ffffffff811434e0>] remote_function+0x40/0x50
> [  212.047759]  [<ffffffff810cc670>] smp_call_function_single+0x190/0x1e0
> [  212.048337]  [<ffffffff8107d969>] ? wait_rcu_gp+0x79/0xa0
> [  212.048374]  [<ffffffff810cc4e5>] ? smp_call_function_single+0x5/0x1e0
> [  212.048418]  [<ffffffff811423c4>] task_function_call+0x44/0x50
> [  212.048458]  [<ffffffff81149900>] ? perf_cpu_hrtimer_handler+0x1f0/0x1f0
> [  212.048502]  [<ffffffff811438d7>] perf_install_in_context+0x87/0x100
> [  212.048546]  [<ffffffff8114eebc>] SYSC_perf_event_open+0xcec/0xea0
> [  212.048590]  [<ffffffff8114f449>] SyS_perf_event_open+0x9/0x10
> [  212.048629]  [<ffffffff81733d54>] tracesys+0xdd/0xe2

Hmm.. seems somewhat connected to your earlier list corruption. I'm
going over that code as we speak, but its 5pm on a friday, need to run
soon.

Could you attach that trinity patch to an email? Maybe I can reproduce
locally tonight or tomorrow.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:32     ` Steven Rostedt
@ 2013-09-13 14:58       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 10:32:38AM -0400, Steven Rostedt wrote:
> "These are not the pasties you are looking for"

I was so reading pastries there.. and I'd be sooo looking for pastries
yumm! :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:16     ` Dave Jones
@ 2013-09-16  9:37       ` Peter Zijlstra
  2013-09-16 10:25         ` Ingo Molnar
  2013-09-16 16:44         ` Dave Jones
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-16  9:37 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
> without the diff, it'll use all kinds of fds. this diff hardwires it to just use
> fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
> way at some point.
> 
> 	Dave
> 
> 
> diff --git a/fds.c b/fds.c
> index cc1ca6d..f48c95e 100644
> --- a/fds.c
> +++ b/fds.c
> @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
>  retry:
>  	i = rand() % 6;
>  
> +	i = 3;
> +
>  	if (do_specific_proto == TRUE)
>  		i = 1;

I'm just not having much luck it seems, even after actually enabling
list debugging (D0H).

What kind of hardware are you running this on?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16  9:37       ` Peter Zijlstra
@ 2013-09-16 10:25         ` Ingo Molnar
  2013-09-16 10:43           ` Peter Zijlstra
  2013-09-16 16:44         ` Dave Jones
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2013-09-16 10:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
> > without the diff, it'll use all kinds of fds. this diff hardwires it to just use
> > fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
> > way at some point.
> > 
> > 	Dave
> > 
> > 
> > diff --git a/fds.c b/fds.c
> > index cc1ca6d..f48c95e 100644
> > --- a/fds.c
> > +++ b/fds.c
> > @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
> >  retry:
> >  	i = rand() % 6;
> >  
> > +	i = 3;
> > +
> >  	if (do_specific_proto == TRUE)
> >  		i = 1;
> 
> I'm just not having much luck it seems, even after actually enabling
> list debugging (D0H).

Maybe Dave could send you his kernel config, to make sure you two run 
comparable kernel images?

> What kind of hardware are you running this on?

That's a possible variation too indeed.

Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
special options for Peter to reproduce your crashes?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 10:25         ` Ingo Molnar
@ 2013-09-16 10:43           ` Peter Zijlstra
  2013-09-16 16:45             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-16 10:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker

On Mon, Sep 16, 2013 at 12:25:35PM +0200, Ingo Molnar wrote:
> Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
> special options for Peter to reproduce your crashes?

Trinity seems to spawn a process for each cpu available. So I get to run
24 copies -- I presume they're all seeded differently.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16  9:37       ` Peter Zijlstra
  2013-09-16 10:25         ` Ingo Molnar
@ 2013-09-16 16:44         ` Dave Jones
  2013-09-18 11:34           ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-16 16:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Mon, Sep 16, 2013 at 11:37:19AM +0200, Peter Zijlstra wrote:
 > On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
 > > without the diff, it'll use all kinds of fds. this diff hardwires it to just use
 > > fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
 > > way at some point.
 > > 
 > > 	Dave
 > > 
 > > 
 > > diff --git a/fds.c b/fds.c
 > > index cc1ca6d..f48c95e 100644
 > > --- a/fds.c
 > > +++ b/fds.c
 > > @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
 > >  retry:
 > >  	i = rand() % 6;
 > >  
 > > +	i = 3;
 > > +
 > >  	if (do_specific_proto == TRUE)
 > >  		i = 1;
 > 
 > I'm just not having much luck it seems, even after actually enabling
 > list debugging (D0H).

may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config

 > What kind of hardware are you running this on?

model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 10:43           ` Peter Zijlstra
@ 2013-09-16 16:45             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2013-09-16 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux Kernel, Frederic Weisbecker

On Mon, Sep 16, 2013 at 12:43:49PM +0200, Peter Zijlstra wrote:
 > On Mon, Sep 16, 2013 at 12:25:35PM +0200, Ingo Molnar wrote:
 > > Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
 > > special options for Peter to reproduce your crashes?
 > 
 > Trinity seems to spawn a process for each cpu available. So I get to run
 > 24 copies -- I presume they're all seeded differently.

They are. And I can trip up that bug with just a single instance of trinity.
(I was running more processes than cpu count though, with -C32)

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 16:44         ` Dave Jones
@ 2013-09-18 11:34           ` Peter Zijlstra
  2013-09-18 13:22             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-18 11:34 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Mon, Sep 16, 2013 at 12:44:21PM -0400, Dave Jones wrote:
> 
> may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config
> 
>  > What kind of hardware are you running this on?
> 
> model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz

Humpf,.. of course I don't have a hsw system :/

I'll try raising the -C count to 4*nr_cpus for the next few runs.

OK, seems I need to edit trinity for that, someone thought 64 was a big
number.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-18 11:34           ` Peter Zijlstra
@ 2013-09-18 13:22             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2013-09-18 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Wed, Sep 18, 2013 at 01:34:11PM +0200, Peter Zijlstra wrote:
 > On Mon, Sep 16, 2013 at 12:44:21PM -0400, Dave Jones wrote:
 > > 
 > > may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config
 > > 
 > >  > What kind of hardware are you running this on?
 > > 
 > > model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
 > 
 > Humpf,.. of course I don't have a hsw system :/
 > 
 > I'll try raising the -C count to 4*nr_cpus for the next few runs.
 > 
 > OK, seems I need to edit trinity for that, someone thought 64 was a big
 > number.

heh, at one time, the walk through the list of child processes in watchdog.c was
taking so long that it considered children that hadn't been scheduled for
a while as stuck. I'm not sure that's any better right now, so you might have
to modify that code too.

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:38   ` Dave Jones
@ 2013-10-03 15:44     ` Peter Zijlstra
  2013-10-03 16:57       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-10-03 15:44 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar, jolsa

On Thu, Sep 12, 2013 at 02:38:49PM -0400, Dave Jones wrote:

> The current one I'm staring at is this from LIST_DEBUG..
> 
> list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).
> 
> The sharp eyed will notice that those first two addresses are actually the same.
> So we matched the next->prev != prev test, but at the time we got to the printk that follows
> in the WARN, the inverse was true. Smells like a race of some kind.

FWIW, I've not given up on this. I've found one bug in the list_entry
handling already -- although not this one.

Today I hit a more revealing error:

[ 8348.150303] list_del corruption. prev->next should be ffff88042839c108, but was 6b6b6b6b6b6b6b6b

And as we all know that 6b is POISON_FREE, so we're staring at a
use-after-free here.

Brain did give out for today though.. more staring at reference counts
tomorrow.

The sad thing is that reproduction takes forever on my machine; I need
to let trinity run for a good hour and then ^C abort the run. Only then
will I hit it fairly reliably.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-10-03 15:44     ` Peter Zijlstra
@ 2013-10-03 16:57       ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2013-10-03 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker, jolsa


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 12, 2013 at 02:38:49PM -0400, Dave Jones wrote:
> 
> > The current one I'm staring at is this from LIST_DEBUG..
> > 
> > list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).
> > 
> > The sharp eyed will notice that those first two addresses are actually the same.
> > So we matched the next->prev != prev test, but at the time we got to the printk that follows
> > in the WARN, the inverse was true. Smells like a race of some kind.
> 
> FWIW, I've not given up on this. I've found one bug in the list_entry
> handling already -- although not this one.
> 
> Today I hit a more revealing error:
> 
> [ 8348.150303] list_del corruption. prev->next should be ffff88042839c108, but was 6b6b6b6b6b6b6b6b
> 
> And as we all know that 6b is POISON_FREE, so we're staring at a
> use-after-free here.
> 
> Brain did give out for today though.. more staring at reference counts
> tomorrow.
> 
> The sad thing is that reproduction takes forever on my machine; I need 
> to let trinity run for a good hour and then ^C abort the run. Only then 
> will I hit it fairly reliably.

Maybe run trinity in an infinite loop and also run this script in the 
background:

	while :; do killall -SIGINT trinity; sleep $[RANDOM/500]; done &

this will Ctrl-C trinity every 0...60 seconds, randomized, average 30 
seconds runtime.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:perf/urgent] perf: Remove fragile swevent hlist optimization
  2013-09-13 11:14   ` Peter Zijlstra
  2013-09-13 14:28     ` Dave Jones
@ 2013-11-19 19:18     ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Peter Zijlstra @ 2013-11-19 19:18 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx

Commit-ID:  06db0b21712f878b808480ef31097637013bbf0f
Gitweb:     http://git.kernel.org/tip/06db0b21712f878b808480ef31097637013bbf0f
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 13 Sep 2013 13:14:47 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 19 Nov 2013 16:57:42 +0100

perf: Remove fragile swevent hlist optimization

Currently we only allocate a single cpu hashtable for per-cpu
swevents; do away with this optimization for it is fragile in the face
of things like perf_pmu_migrate_context().

The easiest thing is to make sure all CPUs are consistent wrt state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d724e77..72348dc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5680,11 +5680,6 @@ static void swevent_hlist_put(struct perf_event *event)
 {
 	int cpu;
 
-	if (event->cpu != -1) {
-		swevent_hlist_put_cpu(event, event->cpu);
-		return;
-	}
-
 	for_each_possible_cpu(cpu)
 		swevent_hlist_put_cpu(event, cpu);
 }
@@ -5718,9 +5713,6 @@ static int swevent_hlist_get(struct perf_event *event)
 	int err;
 	int cpu, failed_cpu;
 
-	if (event->cpu != -1)
-		return swevent_hlist_get_cpu(event, event->cpu);
-
 	get_online_cpus();
 	for_each_possible_cpu(cpu) {
 		err = swevent_hlist_get_cpu(event, cpu);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-11-19 19:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-11 13:54 trinity finds ftrace/perf bug. Film at 11 Dave Jones
2013-09-12 18:19 ` Steven Rostedt
2013-09-12 18:38   ` Dave Jones
2013-10-03 15:44     ` Peter Zijlstra
2013-10-03 16:57       ` Ingo Molnar
2013-09-13 10:56   ` Peter Zijlstra
2013-09-13 14:16     ` Dave Jones
2013-09-16  9:37       ` Peter Zijlstra
2013-09-16 10:25         ` Ingo Molnar
2013-09-16 10:43           ` Peter Zijlstra
2013-09-16 16:45             ` Dave Jones
2013-09-16 16:44         ` Dave Jones
2013-09-18 11:34           ` Peter Zijlstra
2013-09-18 13:22             ` Dave Jones
2013-09-13 14:32     ` Steven Rostedt
2013-09-13 14:58       ` Peter Zijlstra
2013-09-13 11:14   ` Peter Zijlstra
2013-09-13 14:28     ` Dave Jones
2013-09-13 14:58       ` Peter Zijlstra
2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).