All of lore.kernel.org
 help / color / mirror / Atom feed
* trinity finds ftrace/perf bug. Film at 11.
@ 2013-09-11 13:54 Dave Jones
  2013-09-12 18:19 ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-11 13:54 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Linux Kernel

Usual story. I'm trying to narrow down on a tracing bug I'd reported that we had no
understanding of (https://lkml.org/lkml/2013/8/27/646) and had hacked up
trinity to just only use perf fds, and left it running overnight.

Woke up to _another_ bug.

WARNING: CPU: 1 PID: 23361 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
Modules linked in: bnep can_bcm caif_socket caif phonet af_rxrpc bluetooth llc2 af_key rose netrom bridge stp snd_seq_dummy tun fuse scsi_transport_iscsi ipt_ULOG pppoe pppox ppp_generic slhc can_raw can af_802154 nfnetlink nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e pcspkr usb_debug ptp pps_core soundcore
CPU: 1 PID: 23361 Comm: trinity-child31 Not tainted 3.11.0+ #66
 ffffffff81a2a90b ffff8801fe4e9a68 ffffffff8171d32b 0000000000000000
 ffff8801fe4e9aa0 ffffffff81053e5d ffff880102728948 ffff8802459cf3e0
 0000000000000008 0000000000000001 000000002fa8ce99 ffff8801fe4e9ab0
Call Trace:
 [<ffffffff8171d32b>] dump_stack+0x54/0x74
 [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff81142fbd>] perf_swevent_add+0x18d/0x1a0
 [<ffffffff81143b37>] event_sched_in.isra.78+0x87/0x1c0
 [<ffffffff81144a2a>] group_sched_in+0x6a/0x1c0
 [<ffffffff81145721>] ctx_sched_in+0x101/0x290
 [<ffffffff81145910>] perf_event_sched_in+0x60/0x90
 [<ffffffff8114939b>] perf_event_context_sched_in+0x7b/0xc0
 [<ffffffff81149ef7>] __perf_event_task_sched_in+0x477/0x490
 [<ffffffff8109a8e4>] ? arch_vtime_task_switch+0x94/0xa0
 [<ffffffff8108e840>] finish_task_switch+0xf0/0x180
 [<ffffffff817242dc>] __schedule+0x34c/0x9b0
 [<ffffffff81724969>] schedule+0x29/0x70
 [<ffffffff81720e41>] schedule_timeout+0x1c1/0x360
 [<ffffffff810be89f>] ? trace_hardirqs_off_caller+0x1f/0xc0
 [<ffffffff810be94d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81099e2f>] ? local_clock+0x3f/0x50
 [<ffffffff81726cfc>] ? _raw_spin_unlock_irq+0x2c/0x40
 [<ffffffff810c2065>] ? trace_hardirqs_on_caller+0x115/0x1e0
 [<ffffffff81724e64>] wait_for_completion+0xd4/0x110
 [<ffffffff810965a0>] ? wake_up_state+0x20/0x20
 [<ffffffff81088acd>] __synchronize_srcu+0x12d/0x1f0
 [<ffffffff81088580>] ? call_srcu+0x80/0x80
 [<ffffffff8108e52d>] ? complete+0x1d/0x50
 [<ffffffff81088bad>] synchronize_srcu+0x1d/0x20
 [<ffffffff812049ae>] fsnotify_destroy_group+0x1e/0x40
 [<ffffffff81207868>] SyS_inotify_init1+0xe8/0x110
 [<ffffffff812078a0>] sys_inotify_init+0x10/0x20
 [<ffffffff81730b54>] tracesys+0xdd/0xe2
---[ end trace 2e7ee66c64149375 ]---

That's this in perf_swevent_add

5564 
5565         head = find_swevent_head(swhash, event);
5566         if (WARN_ON_ONCE(!head))
5567                 return -EINVAL;
5568 
5569         hlist_add_head_rcu(&event->hlist_entry, head);
5570 
5571         return 0;
5572 }
5573 

Of the 8 syscalls that trinity was running, three were perf_event_open
inotify_init, and newlstat.  

So then I tried running trinity with just those three syscalls.

And got the same WARN_ON, but from a slightly different path..

WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
 ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
 ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
 0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
Call Trace:
 [<ffffffff8171d5cb>] dump_stack+0x54/0x74
 [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
 [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
 [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
 [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
 [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
 [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
 [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
 [<ffffffff8109a954>] ? arch_vtime_task_switch+0x94/0xa0
 [<ffffffff8108e860>] finish_task_switch+0xf0/0x180
 [<ffffffff81724564>] __schedule+0x344/0xa20
 [<ffffffff81093326>] __cond_resched+0x26/0x30
 [<ffffffff8172507a>] _cond_resched+0x3a/0x50
 [<ffffffff811d7fbe>] clear_inode+0x2e/0x80
 [<ffffffff8122bf33>] proc_evict_inode+0x23/0x70
 [<ffffffff811d8f13>] evict+0xa3/0x1a0
 [<ffffffff811d98b5>] iput+0xf5/0x190
 [<ffffffff811d3c55>] dput+0x245/0x260
 [<ffffffff811c6586>] path_put+0x16/0x30
 [<ffffffff811c138d>] vfs_fstatat+0x6d/0xa0
 [<ffffffff811c1822>] SYSC_newlstat+0x22/0x40
 [<ffffffff811c1a3e>] SyS_newlstat+0xe/0x10
 [<ffffffff81730e54>] tracesys+0xdd/0xe2

The good news is I can reproduce that very quickly.
(Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
 and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)


(There's a different kernel build on the two traces fwiw, but the commits are
just 8d7551eb1916832f2a5b27346edf24e7b2382f67 -> bf83e61464803d386d0ec3fc92e5449d7963a409,
which is just a bunch of drm stuff).

	Dave

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-11 13:54 trinity finds ftrace/perf bug. Film at 11 Dave Jones
@ 2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Steven Rostedt @ 2013-09-12 18:19 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, Peter Zijlstra, Frederic Weisbecker, Ingo Molnar


This is something that either Peter or Frederic need to look at.

-- Steve


On Wed, 11 Sep 2013 09:54:34 -0400
Dave Jones <davej@redhat.com> wrote:

> Usual story. I'm trying to narrow down on a tracing bug I'd reported that we had no
> understanding of (https://lkml.org/lkml/2013/8/27/646) and had hacked up
> trinity to just only use perf fds, and left it running overnight.
> 
> Woke up to _another_ bug.
> 
> WARNING: CPU: 1 PID: 23361 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> Modules linked in: bnep can_bcm caif_socket caif phonet af_rxrpc bluetooth llc2 af_key rose netrom bridge stp snd_seq_dummy tun fuse scsi_transport_iscsi ipt_ULOG pppoe pppox ppp_generic slhc can_raw can af_802154 nfnetlink nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e pcspkr usb_debug ptp pps_core soundcore
> CPU: 1 PID: 23361 Comm: trinity-child31 Not tainted 3.11.0+ #66
>  ffffffff81a2a90b ffff8801fe4e9a68 ffffffff8171d32b 0000000000000000
>  ffff8801fe4e9aa0 ffffffff81053e5d ffff880102728948 ffff8802459cf3e0
>  0000000000000008 0000000000000001 000000002fa8ce99 ffff8801fe4e9ab0
> Call Trace:
>  [<ffffffff8171d32b>] dump_stack+0x54/0x74
>  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff81142fbd>] perf_swevent_add+0x18d/0x1a0
>  [<ffffffff81143b37>] event_sched_in.isra.78+0x87/0x1c0
>  [<ffffffff81144a2a>] group_sched_in+0x6a/0x1c0
>  [<ffffffff81145721>] ctx_sched_in+0x101/0x290
>  [<ffffffff81145910>] perf_event_sched_in+0x60/0x90
>  [<ffffffff8114939b>] perf_event_context_sched_in+0x7b/0xc0
>  [<ffffffff81149ef7>] __perf_event_task_sched_in+0x477/0x490
>  [<ffffffff8109a8e4>] ? arch_vtime_task_switch+0x94/0xa0
>  [<ffffffff8108e840>] finish_task_switch+0xf0/0x180
>  [<ffffffff817242dc>] __schedule+0x34c/0x9b0
>  [<ffffffff81724969>] schedule+0x29/0x70
>  [<ffffffff81720e41>] schedule_timeout+0x1c1/0x360
>  [<ffffffff810be89f>] ? trace_hardirqs_off_caller+0x1f/0xc0
>  [<ffffffff810be94d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff81099e2f>] ? local_clock+0x3f/0x50
>  [<ffffffff81726cfc>] ? _raw_spin_unlock_irq+0x2c/0x40
>  [<ffffffff810c2065>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff81724e64>] wait_for_completion+0xd4/0x110
>  [<ffffffff810965a0>] ? wake_up_state+0x20/0x20
>  [<ffffffff81088acd>] __synchronize_srcu+0x12d/0x1f0
>  [<ffffffff81088580>] ? call_srcu+0x80/0x80
>  [<ffffffff8108e52d>] ? complete+0x1d/0x50
>  [<ffffffff81088bad>] synchronize_srcu+0x1d/0x20
>  [<ffffffff812049ae>] fsnotify_destroy_group+0x1e/0x40
>  [<ffffffff81207868>] SyS_inotify_init1+0xe8/0x110
>  [<ffffffff812078a0>] sys_inotify_init+0x10/0x20
>  [<ffffffff81730b54>] tracesys+0xdd/0xe2
> ---[ end trace 2e7ee66c64149375 ]---
> 
> That's this in perf_swevent_add
> 
> 5564 
> 5565         head = find_swevent_head(swhash, event);
> 5566         if (WARN_ON_ONCE(!head))
> 5567                 return -EINVAL;
> 5568 
> 5569         hlist_add_head_rcu(&event->hlist_entry, head);
> 5570 
> 5571         return 0;
> 5572 }
> 5573 
> 
> Of the 8 syscalls that trinity was running, three were perf_event_open
> inotify_init, and newlstat.  
> 
> So then I tried running trinity with just those three syscalls.
> 
> And got the same WARN_ON, but from a slightly different path..
> 
> WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
> CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
>  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
>  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
>  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
> Call Trace:
>  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
>  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
>  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
>  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
>  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
>  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
>  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
>  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
>  [<ffffffff8109a954>] ? arch_vtime_task_switch+0x94/0xa0
>  [<ffffffff8108e860>] finish_task_switch+0xf0/0x180
>  [<ffffffff81724564>] __schedule+0x344/0xa20
>  [<ffffffff81093326>] __cond_resched+0x26/0x30
>  [<ffffffff8172507a>] _cond_resched+0x3a/0x50
>  [<ffffffff811d7fbe>] clear_inode+0x2e/0x80
>  [<ffffffff8122bf33>] proc_evict_inode+0x23/0x70
>  [<ffffffff811d8f13>] evict+0xa3/0x1a0
>  [<ffffffff811d98b5>] iput+0xf5/0x190
>  [<ffffffff811d3c55>] dput+0x245/0x260
>  [<ffffffff811c6586>] path_put+0x16/0x30
>  [<ffffffff811c138d>] vfs_fstatat+0x6d/0xa0
>  [<ffffffff811c1822>] SYSC_newlstat+0x22/0x40
>  [<ffffffff811c1a3e>] SyS_newlstat+0xe/0x10
>  [<ffffffff81730e54>] tracesys+0xdd/0xe2
> 
> The good news is I can reproduce that very quickly.
> (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
>  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
> 
> 
> (There's a different kernel build on the two traces fwiw, but the commits are
> just 8d7551eb1916832f2a5b27346edf24e7b2382f67 -> bf83e61464803d386d0ec3fc92e5449d7963a409,
> which is just a bunch of drm stuff).
> 
> 	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
@ 2013-09-12 18:38   ` Dave Jones
  2013-10-03 15:44     ` Peter Zijlstra
  2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 11:14   ` Peter Zijlstra
  2 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-12 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel, Peter Zijlstra, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > 
 > This is something that either Peter or Frederic need to look at.

I spent yesterday trying to bisect this, but kept running into other perf/ftrace traces,
which makes it pretty much impossible.

The current one I'm staring at is this from LIST_DEBUG..

list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).

The sharp eyed will notice that those first two addresses are actually the same.
So we matched the next->prev != prev test, but at the time we got to the printk that follows
in the WARN, the inverse was true. Smells like a race of some kind.

[ 2481.228418] Call Trace:
[ 2481.232705]  [<ffffffff8171f5dd>] dump_stack+0x54/0x74
[ 2481.237025]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
[ 2481.241357]  [<ffffffff8105412c>] warn_slowpath_fmt+0x4c/0x50
[ 2481.245637]  [<ffffffff810540e5>] ? warn_slowpath_fmt+0x5/0x50
[ 2481.249890]  [<ffffffff8132c222>] __list_del_entry+0x62/0xf0
[ 2481.254109]  [<ffffffff81143a88>] list_del_event+0xc8/0xe0
[ 2481.258310]  [<ffffffff81143b2b>] perf_remove_from_context+0x8b/0xf0
[ 2481.262497]  [<ffffffff81145444>] perf_event_release_kernel+0x54/0x90
[ 2481.266688]  [<ffffffff811464d3>] put_event+0x133/0x1c0
[ 2481.270866]  [<ffffffff811463d0>] ? put_event+0x30/0x1c0
[ 2481.275066]  [<ffffffff81146570>] perf_release+0x10/0x20
[ 2481.279277]  [<ffffffff811bf2ca>] __fput+0xfa/0x2e0
[ 2481.283488]  [<ffffffff811bf4fe>] ____fput+0xe/0x10
[ 2481.287713]  [<ffffffff8107c11c>] task_work_run+0xac/0xe0
[ 2481.291958]  [<ffffffff810559bc>] do_exit+0x2cc/0xcc0
[ 2481.296194]  [<ffffffff81732880>] ? ftrace_call+0x5/0x2f
[ 2481.300435]  [<ffffffff810577cc>] do_group_exit+0x4c/0xc0
[ 2481.304654]  [<ffffffff81057854>] SyS_exit_group+0x14/0x20
[ 2481.308857]  [<ffffffff81732e54>] tracesys+0xdd/0xe2
[ 2481.313051] ---[ end trace fa3e551eb20e0c0f ]---

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
@ 2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 14:16     ` Dave Jones
  2013-09-13 14:32     ` Steven Rostedt
  2013-09-13 11:14   ` Peter Zijlstra
  2 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 10:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:

> > The good news is I can reproduce that very quickly.
> > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
> >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)

" The paste you are looking for does not exist "

And that line on its own doesn't seem to do much.. its been running for
minutes now.

Or does 'quickly' mean something else?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:19 ` Steven Rostedt
  2013-09-12 18:38   ` Dave Jones
  2013-09-13 10:56   ` Peter Zijlstra
@ 2013-09-13 11:14   ` Peter Zijlstra
  2013-09-13 14:28     ` Dave Jones
  2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 11:14 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
> > WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
> > Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
> > CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
> >  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
> >  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
> >  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
> > Call Trace:
> >  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
> >  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
> >  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
> >  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
> >  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
> >  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
> >  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
> >  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
> >  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490

So I've got an idea how this can happen. If we have a per-cpu swevent
and group it with an uncore counter which lives on another cpu we'll
migrate the swevent using perf_pmu_migrate_context() but it doesn't
migrate the swhash.

The below should be able to confirm that theory if one can reproduce the
issue.

---
 kernel/events/core.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2207efc..e1441f5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5621,11 +5621,6 @@ static void swevent_hlist_put(struct perf_event *event)
 {
 	int cpu;
 
-	if (event->cpu != -1) {
-		swevent_hlist_put_cpu(event, event->cpu);
-		return;
-	}
-
 	for_each_possible_cpu(cpu)
 		swevent_hlist_put_cpu(event, cpu);
 }
@@ -5659,9 +5654,6 @@ static int swevent_hlist_get(struct perf_event *event)
 	int err;
 	int cpu, failed_cpu;
 
-	if (event->cpu != -1)
-		return swevent_hlist_get_cpu(event, event->cpu);
-
 	get_online_cpus();
 	for_each_possible_cpu(cpu) {
 		err = swevent_hlist_get_cpu(event, cpu);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 10:56   ` Peter Zijlstra
@ 2013-09-13 14:16     ` Dave Jones
  2013-09-16  9:37       ` Peter Zijlstra
  2013-09-13 14:32     ` Steven Rostedt
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-13 14:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 12:56:37PM +0200, Peter Zijlstra wrote:
 > On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > 
 > > > The good news is I can reproduce that very quickly.
 > > > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
 > > >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
 > 
 > " The paste you are looking for does not exist "
 
bah. that sucks. appended below.

 > And that line on its own doesn't seem to do much.. its been running for
 > minutes now.
 > 
 > Or does 'quickly' mean something else?

without the diff, it'll use all kinds of fds. this diff hardwires it to just use
fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
way at some point.

	Dave


diff --git a/fds.c b/fds.c
index cc1ca6d..f48c95e 100644
--- a/fds.c
+++ b/fds.c
@@ -119,6 +119,8 @@ static int get_new_random_fd(void)
 retry:
 	i = rand() % 6;
 
+	i = 3;
+
 	if (do_specific_proto == TRUE)
 		i = 1;
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 11:14   ` Peter Zijlstra
@ 2013-09-13 14:28     ` Dave Jones
  2013-09-13 14:58       ` Peter Zijlstra
  2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-13 14:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 01:14:47PM +0200, Peter Zijlstra wrote:
 > On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
 > > > WARNING: CPU: 3 PID: 861 at kernel/events/core.c:5566 perf_swevent_add+0x18d/0x1a0()
 > > > Modules linked in: ipt_ULOG nfnetlink can_bcm can scsi_transport_iscsi ax25 nfc rfkill af_802154 irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc snd_hda_codec_realtek snd_hda_codec_hdmi xfs snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc libcrc32c snd_timer snd e1000e pcspkr ptp pps_core soundcore usb_debug
 > > > CPU: 3 PID: 861 Comm: trinity-child31 Not tainted 3.11.0+ #67 
 > > >  ffffffff81a2aa43 ffff8801e6c65ae8 ffffffff8171d5cb 0000000000000000
 > > >  ffff8801e6c65b20 ffffffff81053e5d ffff8801e66a2e68 ffff880245dcf3e0
 > > >  0000000000000004 0000000000000001 0000000004392ac6 ffff8801e6c65b30
 > > > Call Trace:
 > > >  [<ffffffff8171d5cb>] dump_stack+0x54/0x74
 > > >  [<ffffffff81053e5d>] warn_slowpath_common+0x7d/0xa0
 > > >  [<ffffffff81053f3a>] warn_slowpath_null+0x1a/0x20
 > > >  [<ffffffff8114302d>] perf_swevent_add+0x18d/0x1a0
 > > >  [<ffffffff81143ba7>] event_sched_in.isra.78+0x87/0x1c0
 > > >  [<ffffffff81144a9a>] group_sched_in+0x6a/0x1c0
 > > >  [<ffffffff8114580c>] ctx_sched_in+0x17c/0x290
 > > >  [<ffffffff8114595a>] perf_event_sched_in+0x3a/0x90
 > > >  [<ffffffff8114940b>] perf_event_context_sched_in+0x7b/0xc0
 > > >  [<ffffffff81149f67>] __perf_event_task_sched_in+0x477/0x490
 > 
 > So I've got an idea how this can happen. If we have a per-cpu swevent
 > and group it with an uncore counter which lives on another cpu we'll
 > migrate the swevent using perf_pmu_migrate_context() but it doesn't
 > migrate the swhash.
 > 
 > The below should be able to confirm that theory if one can reproduce the
 > issue.

With that, I now get a different bug... yay?

[  212.012514] WARNING: CPU: 1 PID: 575 at kernel/events/core.c:1109 add_event_to_ctx+0x1fd/0x210()
[  212.016034] Modules linked in: pppoe pppox ppp_generic slhc ipt_ULOG can_raw can scsi_transport_iscsi nfnetlink af_802154 nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e usb_debug soundcore ptp pps_core pcspkr
[  212.026484] CPU: 1 PID: 575 Comm: trinity-child11 Not tainted 3.11.0+ #75 
[  212.031121]  ffffffff81a2b40e ffff88023087dcc0 ffffffff81720482 0000000000000000
[  212.033279]  ffff88023087dcf8 ffffffff810540bd ffff88024197e418 ffff88023976ae68
[  212.035471]  ffff88024197e418 000000007b016bc1 ffff8802419a9530 ffff88023087dd08
[  212.037549] Call Trace:
[  212.038277]  [<ffffffff81720482>] dump_stack+0x54/0x74
[  212.039604]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
[  212.040946]  [<ffffffff8105419a>] warn_slowpath_null+0x1a/0x20
[  212.042296]  [<ffffffff8114383d>] add_event_to_ctx+0x1fd/0x210
[  212.043650]  [<ffffffff81149a36>] __perf_install_in_context+0x136/0x260
[  212.045147]  [<ffffffff811434a0>] ? rcu_read_lock_held+0x50/0x50
[  212.046523]  [<ffffffff811434e0>] remote_function+0x40/0x50
[  212.047759]  [<ffffffff810cc670>] smp_call_function_single+0x190/0x1e0
[  212.048337]  [<ffffffff8107d969>] ? wait_rcu_gp+0x79/0xa0
[  212.048374]  [<ffffffff810cc4e5>] ? smp_call_function_single+0x5/0x1e0
[  212.048418]  [<ffffffff811423c4>] task_function_call+0x44/0x50
[  212.048458]  [<ffffffff81149900>] ? perf_cpu_hrtimer_handler+0x1f0/0x1f0
[  212.048502]  [<ffffffff811438d7>] perf_install_in_context+0x87/0x100
[  212.048546]  [<ffffffff8114eebc>] SYSC_perf_event_open+0xcec/0xea0
[  212.048590]  [<ffffffff8114f449>] SyS_perf_event_open+0x9/0x10
[  212.048629]  [<ffffffff81733d54>] tracesys+0xdd/0xe2

 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 10:56   ` Peter Zijlstra
  2013-09-13 14:16     ` Dave Jones
@ 2013-09-13 14:32     ` Steven Rostedt
  2013-09-13 14:58       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2013-09-13 14:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, 13 Sep 2013 12:56:37 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 12, 2013 at 02:19:13PM -0400, Steven Rostedt wrote:
> 
> > > The good news is I can reproduce that very quickly.
> > > (Apply http://paste.fedoraproject.org/38721/37890755 on top of trinity.git,
> > >  and run ./trinity -l off -q -C32 -c inotify_init1 -c perf_event_open -c newlstat)
> 
> " The paste you are looking for does not exist "
> 

"These are not the pasties you are looking for"


-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:28     ` Dave Jones
@ 2013-09-13 14:58       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:58 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Fri, Sep 13, 2013 at 10:28:56AM -0400, Dave Jones wrote:
> With that, I now get a different bug... yay?
> 
> [  212.012514] WARNING: CPU: 1 PID: 575 at kernel/events/core.c:1109 add_event_to_ctx+0x1fd/0x210()
> [  212.016034] Modules linked in: pppoe pppox ppp_generic slhc ipt_ULOG can_raw can scsi_transport_iscsi nfnetlink af_802154 nfc rfkill irda crc_ccitt rds x25 atm appletalk ipx p8023 psnap p8022 llc ax25 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd e1000e usb_debug soundcore ptp pps_core pcspkr
> [  212.026484] CPU: 1 PID: 575 Comm: trinity-child11 Not tainted 3.11.0+ #75 
> [  212.031121]  ffffffff81a2b40e ffff88023087dcc0 ffffffff81720482 0000000000000000
> [  212.033279]  ffff88023087dcf8 ffffffff810540bd ffff88024197e418 ffff88023976ae68
> [  212.035471]  ffff88024197e418 000000007b016bc1 ffff8802419a9530 ffff88023087dd08
> [  212.037549] Call Trace:
> [  212.038277]  [<ffffffff81720482>] dump_stack+0x54/0x74
> [  212.039604]  [<ffffffff810540bd>] warn_slowpath_common+0x7d/0xa0
> [  212.040946]  [<ffffffff8105419a>] warn_slowpath_null+0x1a/0x20
> [  212.042296]  [<ffffffff8114383d>] add_event_to_ctx+0x1fd/0x210
> [  212.043650]  [<ffffffff81149a36>] __perf_install_in_context+0x136/0x260
> [  212.045147]  [<ffffffff811434a0>] ? rcu_read_lock_held+0x50/0x50
> [  212.046523]  [<ffffffff811434e0>] remote_function+0x40/0x50
> [  212.047759]  [<ffffffff810cc670>] smp_call_function_single+0x190/0x1e0
> [  212.048337]  [<ffffffff8107d969>] ? wait_rcu_gp+0x79/0xa0
> [  212.048374]  [<ffffffff810cc4e5>] ? smp_call_function_single+0x5/0x1e0
> [  212.048418]  [<ffffffff811423c4>] task_function_call+0x44/0x50
> [  212.048458]  [<ffffffff81149900>] ? perf_cpu_hrtimer_handler+0x1f0/0x1f0
> [  212.048502]  [<ffffffff811438d7>] perf_install_in_context+0x87/0x100
> [  212.048546]  [<ffffffff8114eebc>] SYSC_perf_event_open+0xcec/0xea0
> [  212.048590]  [<ffffffff8114f449>] SyS_perf_event_open+0x9/0x10
> [  212.048629]  [<ffffffff81733d54>] tracesys+0xdd/0xe2

Hmm.. seems somewhat connected to your earlier list corruption. I'm
going over that code as we speak, but its 5pm on a friday, need to run
soon.

Could you attach that trinity patch to an email? Maybe I can reproduce
locally tonight or tomorrow.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:32     ` Steven Rostedt
@ 2013-09-13 14:58       ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Dave Jones, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Fri, Sep 13, 2013 at 10:32:38AM -0400, Steven Rostedt wrote:
> "These are not the pasties you are looking for"

I was so reading pastries there.. and I'd be sooo looking for pastries
yumm! :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-13 14:16     ` Dave Jones
@ 2013-09-16  9:37       ` Peter Zijlstra
  2013-09-16 10:25         ` Ingo Molnar
  2013-09-16 16:44         ` Dave Jones
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-16  9:37 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
> without the diff, it'll use all kinds of fds. this diff hardwires it to just use
> fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
> way at some point.
> 
> 	Dave
> 
> 
> diff --git a/fds.c b/fds.c
> index cc1ca6d..f48c95e 100644
> --- a/fds.c
> +++ b/fds.c
> @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
>  retry:
>  	i = rand() % 6;
>  
> +	i = 3;
> +
>  	if (do_specific_proto == TRUE)
>  		i = 1;

I'm just not having much luck it seems, even after actually enabling
list debugging (D0H).

What kind of hardware are you running this on?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16  9:37       ` Peter Zijlstra
@ 2013-09-16 10:25         ` Ingo Molnar
  2013-09-16 10:43           ` Peter Zijlstra
  2013-09-16 16:44         ` Dave Jones
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2013-09-16 10:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
> > without the diff, it'll use all kinds of fds. this diff hardwires it to just use
> > fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
> > way at some point.
> > 
> > 	Dave
> > 
> > 
> > diff --git a/fds.c b/fds.c
> > index cc1ca6d..f48c95e 100644
> > --- a/fds.c
> > +++ b/fds.c
> > @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
> >  retry:
> >  	i = rand() % 6;
> >  
> > +	i = 3;
> > +
> >  	if (do_specific_proto == TRUE)
> >  		i = 1;
> 
> I'm just not having much luck it seems, even after actually enabling
> list debugging (D0H).

Maybe Dave could send you his kernel config, to make sure you two run 
comparable kernel images?

> What kind of hardware are you running this on?

That's a possible variation too indeed.

Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
special options for Peter to reproduce your crashes?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 10:25         ` Ingo Molnar
@ 2013-09-16 10:43           ` Peter Zijlstra
  2013-09-16 16:45             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-16 10:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker

On Mon, Sep 16, 2013 at 12:25:35PM +0200, Ingo Molnar wrote:
> Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
> special options for Peter to reproduce your crashes?

Trinity seems to spawn a process for each cpu available. So I get to run
24 copies -- I presume they're all seeded differently.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16  9:37       ` Peter Zijlstra
  2013-09-16 10:25         ` Ingo Molnar
@ 2013-09-16 16:44         ` Dave Jones
  2013-09-18 11:34           ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Jones @ 2013-09-16 16:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Mon, Sep 16, 2013 at 11:37:19AM +0200, Peter Zijlstra wrote:
 > On Fri, Sep 13, 2013 at 10:16:38AM -0400, Dave Jones wrote:
 > > without the diff, it'll use all kinds of fds. this diff hardwires it to just use
 > > fd's from perf_event_open.  I'll hack up a command line switch to do this in a better
 > > way at some point.
 > > 
 > > 	Dave
 > > 
 > > 
 > > diff --git a/fds.c b/fds.c
 > > index cc1ca6d..f48c95e 100644
 > > --- a/fds.c
 > > +++ b/fds.c
 > > @@ -119,6 +119,8 @@ static int get_new_random_fd(void)
 > >  retry:
 > >  	i = rand() % 6;
 > >  
 > > +	i = 3;
 > > +
 > >  	if (do_specific_proto == TRUE)
 > >  		i = 1;
 > 
 > I'm just not having much luck it seems, even after actually enabling
 > list debugging (D0H).

may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config

 > What kind of hardware are you running this on?

model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 10:43           ` Peter Zijlstra
@ 2013-09-16 16:45             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2013-09-16 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Steven Rostedt, Linux Kernel, Frederic Weisbecker

On Mon, Sep 16, 2013 at 12:43:49PM +0200, Peter Zijlstra wrote:
 > On Mon, Sep 16, 2013 at 12:25:35PM +0200, Ingo Molnar wrote:
 > > Also, Dave, do you run a single copy of Trinity, or multiple ones? Any 
 > > special options for Peter to reproduce your crashes?
 > 
 > Trinity seems to spawn a process for each cpu available. So I get to run
 > 24 copies -- I presume they're all seeded differently.

They are. And I can trip up that bug with just a single instance of trinity.
(I was running more processes than cpu count though, with -C32)

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-16 16:44         ` Dave Jones
@ 2013-09-18 11:34           ` Peter Zijlstra
  2013-09-18 13:22             ` Dave Jones
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-09-18 11:34 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar

On Mon, Sep 16, 2013 at 12:44:21PM -0400, Dave Jones wrote:
> 
> may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config
> 
>  > What kind of hardware are you running this on?
> 
> model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz

Humpf,.. of course I don't have a hsw system :/

I'll try raising the -C count to 4*nr_cpus for the next few runs.

OK, seems I need to edit trinity for that, someone thought 64 was a big
number.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-18 11:34           ` Peter Zijlstra
@ 2013-09-18 13:22             ` Dave Jones
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2013-09-18 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Linux Kernel, Frederic Weisbecker, Ingo Molnar

On Wed, Sep 18, 2013 at 01:34:11PM +0200, Peter Zijlstra wrote:
 > On Mon, Sep 16, 2013 at 12:44:21PM -0400, Dave Jones wrote:
 > > 
 > > may be some other option.. Try http://www.codemonkey.org.uk/junk/haswell-config
 > > 
 > >  > What kind of hardware are you running this on?
 > > 
 > > model name	: Intel(R) Core(TM) i5-4670T CPU @ 2.30GHz
 > 
 > Humpf,.. of course I don't have a hsw system :/
 > 
 > I'll try raising the -C count to 4*nr_cpus for the next few runs.
 > 
 > OK, seems I need to edit trinity for that, someone thought 64 was a big
 > number.

heh, at one time, the walk through the list of child processes in watchdog.c was
taking so long that it considered children that hadn't been scheduled for
a while as stuck. I'm not sure that's any better right now, so you might have
to modify that code too.

	Dave


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-09-12 18:38   ` Dave Jones
@ 2013-10-03 15:44     ` Peter Zijlstra
  2013-10-03 16:57       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2013-10-03 15:44 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker,
	Ingo Molnar, jolsa

On Thu, Sep 12, 2013 at 02:38:49PM -0400, Dave Jones wrote:

> The current one I'm staring at is this from LIST_DEBUG..
> 
> list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).
> 
> The sharp eyed will notice that those first two addresses are actually the same.
> So we matched the next->prev != prev test, but at the time we got to the printk that follows
> in the WARN, the inverse was true. Smells like a race of some kind.

FWIW, I've not given up on this. I've found one bug in the list_entry
handling already -- although not this one.

Today I hit a more revealing error:

[ 8348.150303] list_del corruption. prev->next should be ffff88042839c108, but was 6b6b6b6b6b6b6b6b

And as we all know that 6b is POISON_FREE, so we're staring at a
use-after-free here.

Brain did give out for today though.. more staring at reference counts
tomorrow.

The sad thing is that reproduction takes forever on my machine; I need
to let trinity run for a good hour and then ^C abort the run. Only then
will I hit it fairly reliably.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trinity finds ftrace/perf bug. Film at 11.
  2013-10-03 15:44     ` Peter Zijlstra
@ 2013-10-03 16:57       ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2013-10-03 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Steven Rostedt, Linux Kernel, Frederic Weisbecker, jolsa


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 12, 2013 at 02:38:49PM -0400, Dave Jones wrote:
> 
> > The current one I'm staring at is this from LIST_DEBUG..
> > 
> > list_del corruption. next->prev should be prev (ffff88000fb812b0), but was ffff88000fb812b0. (next=ffff88009df8b7b0).
> > 
> > The sharp eyed will notice that those first two addresses are actually the same.
> > So we matched the next->prev != prev test, but at the time we got to the printk that follows
> > in the WARN, the inverse was true. Smells like a race of some kind.
> 
> FWIW, I've not given up on this. I've found one bug in the list_entry
> handling already -- although not this one.
> 
> Today I hit a more revealing error:
> 
> [ 8348.150303] list_del corruption. prev->next should be ffff88042839c108, but was 6b6b6b6b6b6b6b6b
> 
> And as we all know that 6b is POISON_FREE, so we're staring at a
> use-after-free here.
> 
> Brain did give out for today though.. more staring at reference counts
> tomorrow.
> 
> The sad thing is that reproduction takes forever on my machine; I need 
> to let trinity run for a good hour and then ^C abort the run. Only then 
> will I hit it fairly reliably.

Maybe run trinity in an infinite loop and also run this script in the 
background:

	while :; do killall -SIGINT trinity; sleep $[RANDOM/500]; done &

this will Ctrl-C trinity every 0...60 seconds, randomized, average 30 
seconds runtime.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:perf/urgent] perf: Remove fragile swevent hlist optimization
  2013-09-13 11:14   ` Peter Zijlstra
  2013-09-13 14:28     ` Dave Jones
@ 2013-11-19 19:18     ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Peter Zijlstra @ 2013-11-19 19:18 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx

Commit-ID:  06db0b21712f878b808480ef31097637013bbf0f
Gitweb:     http://git.kernel.org/tip/06db0b21712f878b808480ef31097637013bbf0f
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 13 Sep 2013 13:14:47 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 19 Nov 2013 16:57:42 +0100

perf: Remove fragile swevent hlist optimization

Currently we only allocate a single cpu hashtable for per-cpu
swevents; do away with this optimization for it is fragile in the face
of things like perf_pmu_migrate_context().

The easiest thing is to make sure all CPUs are consistent wrt state.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d724e77..72348dc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5680,11 +5680,6 @@ static void swevent_hlist_put(struct perf_event *event)
 {
 	int cpu;
 
-	if (event->cpu != -1) {
-		swevent_hlist_put_cpu(event, event->cpu);
-		return;
-	}
-
 	for_each_possible_cpu(cpu)
 		swevent_hlist_put_cpu(event, cpu);
 }
@@ -5718,9 +5713,6 @@ static int swevent_hlist_get(struct perf_event *event)
 	int err;
 	int cpu, failed_cpu;
 
-	if (event->cpu != -1)
-		return swevent_hlist_get_cpu(event, event->cpu);
-
 	get_online_cpus();
 	for_each_possible_cpu(cpu) {
 		err = swevent_hlist_get_cpu(event, cpu);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-11-19 19:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-11 13:54 trinity finds ftrace/perf bug. Film at 11 Dave Jones
2013-09-12 18:19 ` Steven Rostedt
2013-09-12 18:38   ` Dave Jones
2013-10-03 15:44     ` Peter Zijlstra
2013-10-03 16:57       ` Ingo Molnar
2013-09-13 10:56   ` Peter Zijlstra
2013-09-13 14:16     ` Dave Jones
2013-09-16  9:37       ` Peter Zijlstra
2013-09-16 10:25         ` Ingo Molnar
2013-09-16 10:43           ` Peter Zijlstra
2013-09-16 16:45             ` Dave Jones
2013-09-16 16:44         ` Dave Jones
2013-09-18 11:34           ` Peter Zijlstra
2013-09-18 13:22             ` Dave Jones
2013-09-13 14:32     ` Steven Rostedt
2013-09-13 14:58       ` Peter Zijlstra
2013-09-13 11:14   ` Peter Zijlstra
2013-09-13 14:28     ` Dave Jones
2013-09-13 14:58       ` Peter Zijlstra
2013-11-19 19:18     ` [tip:perf/urgent] perf: Remove fragile swevent hlist optimization tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.