linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Another RCU trace. (3.10-rc5)
@ 2013-06-10 19:51 Dave Jones
  2013-06-10 20:33 ` Paul E. McKenney
  2013-06-10 20:38 ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Jones @ 2013-06-10 19:51 UTC (permalink / raw)
  To: Paul E. McKenney, Linux Kernel, fweisbec; +Cc: rostedt

Something in this code still isn't right in rc5.
I got a pretty mangled trace (and then total lockup) pretty quickly..

[ 1300.705212] ===============================
[ 1300.707514] [ INFO: suspicious RCU usage. ]
[ 1300.709794] 3.10.0-rc5+ #6 Not tainted
[ 1300.712064] -------------------------------
[ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1300.717516] 
[ 1300.717516] other info that might help us debug this:
[ 1300.717516] 
[ 1300.723450] 
[ 1300.723450] RCU used illegally from idle CPU!
[ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
[ 1300.730124] RCU used illegally from extended quiescent state!
[ 1300.732971] 2 locks held by kworker/0:0/12904:
[ 1300.735395]  #0:  (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
[ 1300.740229]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
[ 1300.744305] 
[ 1300.744305] stack backtrace:
[ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6 
[ 1300.756486] ------------[ cut here ]------------
[ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.783521] ------------[ cut here ]------------
[ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.795211] ------------[ cut here ]------------
[ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.808270] ------------[ cut here ]------------
[ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.822581] ------------[ cut here ]------------
[ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
[ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
[ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
[ 1300.838244] ------------[ cut here ]------------
[ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
[ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
[ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
[ 1300.914988]  ffffffff8104a1b1 000000000b89a53f ffff
[ 1300.917385]  0000000000000008 ffffffff81101b57 ffff
[ 1300.919760] Call Trace:
[ 1300.922120]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 1300.924429]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
[ 1300.926721]  [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
[ 1300.928993]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
[ 1300.931243]  [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
[ 1300.933487]  [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702]  [<ffffffff81145285>] user_exit+0[ 1300.937882]  [<ffffffff816f0591>] do_page_fau[ 1300.940037]  [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158]  [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270]  [<ffffffff81308fb9>] ? copy_user[ 1300.946365]  [<ffffffff8114b46c>] ? __probe_k[ 1300.948437]  [<ffffffff81072598>] print_worke[ 1300.950502]  [<ffffffff8104de32>] dump_stack_[ 1300.952543]  [<ffffffff816e39d2>] dum[ 1300.954546]  [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525]  [<ffffffff8104a28a>] warn_slowpa[ 1300.958471]  [<ffffffff81101b57>] rcu[ 1300.960401]  [<ffffffff81091b94>] ? get_vtime[ 1300.962309]  [<ffffffff81101bda>] rcu[ 1300.964196]  [<ffffffff811039e3>] rcu_user_ex[ 1300.966071]  [<ffffffff81145275>] use[ 1300.967911]  [<ffffffff816f0591>] do_[ 1300.969729]  [<ffffffff816ed162>] pag[ 1300.971510]  [<ffffffff81091b94>] ? get_vtime[ 1300.973270]  [<ffffffff81308fb9>] ? c[ 1300.975014]  [ 1300.989735]  [ 1300.991250]  [<ffffffff816f05[ 1300.992759]  [<ffffffff816ed1[ 1300.994234]  [<ffffffff81101b57>] ? r[ 1300.995698]  [<ffffffff81308f[ 1300.997130]  [<ffffffff8114b4[ 1300.998534]  [<ffffffff810725[ 1300.999918]  [<ffffffff8104de[ 1301.001284]  [<ffffffff816e39[ 1301.002620]  [<ffffffff8104a1[ 1301.003955]  [<ffffff[ 1301.005255]  [<ffffffff81101b[ 1301.006533]  [<ffffffff81091b[ 1301.007780]  [<ffffff[ 1301.008997]  [<ffffffff811039[ 1301.010190]  [<ffffff[ 1301.011355]  [<ffffff[ 1301.012504]  [<ffffffff816ed1[ 1301.013642]  [<ffffff[ 1301.014758]  [<ffffff[ 1301.015852]  [<ffffff[ 1301.016913]  [<ffffff[ 1301.017948]  [<ffffff[ 1301.018965]  [<ffffff[ 1301.019951]  [<ffffff[ 1301.020924]  [ 1301.021895]  [<ffffff[ 1301.022861]  [<ffffff[ 1301.023820]  [<ffffff[ 1301.024776]  [<ffffff[ 1301.025726]  [ 1301.026679]  [<ffffff[ 1301.027626]  [<ffffff[ 1301.028585]  [<ffffff[ 1301.029544]  [ 1301.030502]  [<ffffff[ 1301.031463]  [<ffffff[ 1301.032430]  [<ffffff[ 1301.033391]  [<ffffff[ 1301.034351]  [ 1301.035305]  [<ffffff[ 1301.036264]  [<ffffff[ 1301.037218]  [ 1301.038171]  [<ffffff[ 1301.039122]  [<ffffff[ 1301.040069]  [<ffffff[ 1301.041018]  [ 1301.041965]  [<ffffff[ 1301.042916]  [<ffffff[ 1301.043880]  [<ffffff[ 1301.044842]  [<ffffff[ 1301.045808]  [ 1301.046780]  [<ffffff[ 1301.047737]  [<ffffff[ 1301.048705]  [<ffffff[ 1301.049669]  [ 1301.050632]  [<ffffff[ 1301.051594]  [<ffffff[ 1301.052557]  [<ffffff[ 1301.053517]  [<ffffff[ 1301.054476]  [ 1301.055430]  [<ffffff[ 1301.056392]  [<ffffff[ 1301.057361]  [<ffffff[ 1301.058329]  [ 1301.059303]  [<ffffff[ 1301.060281]  [<ffffff[ 1301.061247]  [<ffffff[ 1301.062217]  [<ffffff[ 1301.063180]  [ 1301.064146]  [<ffffff[ 1301.065110]  [<ffffff[ 1301.066069]  [<ffffff[ 1301.067028]  [<ffffff[ 1301.067982]  [ 1301.068944]  [<ffffff[ 1301.069899]  [<ffffff[ 1301.070860]  [<ffffff[ 1301.071834]  [ 1301.072808]  [<ffffff[ 1301.073781]  [<ffffff[ 1301.074761]  [<ffffff[ 1301.075732]  [<ffffff[ 1301.0[ 1301.0[ 1301.092183]  [<ffffff[ 1301.093150]  [ 1301.094113]  [<ffffff[ 1301.095081]  [<ffffff[ 1301.096050]  [<ffffff[ 1301.097011]  [<ffffff[ 1301.097975]  [ 1301.098944]  [<ffffff[ 1301.099917]  [<ffffff[ 1301.100891]  [<ffffff[ 1301.101859]  [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov

and then total lockup.


I saw some of Steven's patches get merged on Friday, is there anything else
outstanding that didn't make it in yet that I could test ?
Or is this another new bug ?

	Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 19:51 Another RCU trace. (3.10-rc5) Dave Jones
@ 2013-06-10 20:33 ` Paul E. McKenney
  2013-06-10 21:01   ` Dave Jones
  2013-06-10 20:38 ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-06-10 20:33 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, fweisbec, rostedt

On Mon, Jun 10, 2013 at 03:51:45PM -0400, Dave Jones wrote:
> Something in this code still isn't right in rc5.
> I got a pretty mangled trace (and then total lockup) pretty quickly..
> 
> [ 1300.705212] ===============================
> [ 1300.707514] [ INFO: suspicious RCU usage. ]
> [ 1300.709794] 3.10.0-rc5+ #6 Not tainted
> [ 1300.712064] -------------------------------
> [ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 1300.717516] 
> [ 1300.717516] other info that might help us debug this:
> [ 1300.717516] 
> [ 1300.723450] 
> [ 1300.723450] RCU used illegally from idle CPU!
> [ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
> [ 1300.730124] RCU used illegally from extended quiescent state!
> [ 1300.732971] 2 locks held by kworker/0:0/12904:
> [ 1300.735395]  #0:  (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
> [ 1300.740229]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
> [ 1300.744305] 
> [ 1300.744305] stack backtrace:
> [ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6 
> [ 1300.756486] ------------[ cut here ]------------
> [ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.783521] ------------[ cut here ]------------
> [ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.795211] ------------[ cut here ]------------
> [ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.808270] ------------[ cut here ]------------
> [ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.822581] ------------[ cut here ]------------
> [ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.838244] ------------[ cut here ]------------
> [ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
> [ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
> [ 1300.914988]  ffffffff8104a1b1 000000000b89a53f ffff
> [ 1300.917385]  0000000000000008 ffffffff81101b57 ffff
> [ 1300.919760] Call Trace:
> [ 1300.922120]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 1300.924429]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 1300.926721]  [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
> [ 1300.928993]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 1300.931243]  [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
> [ 1300.933487]  [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702]  [<ffffffff81145285>] user_exit+0[ 1300.937882]  [<ffffffff816f0591>] do_page_fau[ 1300.940037]  [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158]  [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270]  [<ffffffff81308fb9>] ? copy_user[ 1300.946365]  [<ffffffff8114b46c>] ? __probe_k[ 1300.948437]  [<ffffffff81072598>] print_worke[ 1300.950502]  [<ffffffff8104de32>] dump_stack_[ 1300.952543]  [<ffffffff816e39d2>] dum[ 1300.954546]  [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525]  [<ffffffff8104a28a>] warn_slowpa[ 1300.958471]  [<ffffffff81101b57>] rcu[ 1300.960401]  [<ffffffff81091b94>] ? get_vtime[ 1300.962309]  [<ffffffff81101bda>] rcu[ 1300.964196]  [<ffffffff811039e3>] rcu_user_ex[ 1300.966071]  [<ffffffff81145275>] use[ 1300.967911]  [<ffffffff816f0591>] do_[ 1300.969729]  [<ffffffff816ed162>] pag[ 1300.971510]  [<ffffffff81091b94>] ? get_vtime[ 1300.973270]  [<ffffffff81308fb9>] ? c[ 1300.975014]  [ 1300.989735]  [ 1300.991250]  [<ffffffff816f05[ 1300.992759]  [<ffffffff816ed1[ 1300.994234]  [<ffffffff81101b57>] ? r[ 1300.995698]  [<ffffffff81308f[ 1300.997130]  [<ffffffff8114b4[ 1300.998534]  [<ffffffff810725[ 1300.999918]  [<ffffffff8104de[ 1301.001284]  [<ffffffff816e39[ 1301.002620]  [<ffffffff8104a1[ 1301.003955]  [<ffffff[ 1301.005255]  [<ffffffff81101b[ 1301.006533]  [<ffffffff81091b[ 1301.007780]  [<ffffff[ 1301.008997]  [<ffffffff811039[ 1301.010190]  [<ffffff[ 1301.011355]  [<ffffff[ 1301.012504]  [<ffffffff816ed1[ 1301.013642]  [<ffffff[ 1301.014758]  [<ffffff[ 1301.015852]  [<ffffff[ 1301.016913]  [<ffffff[ 1301.017948]  [<ffffff[ 1301.018965]  [<ffffff[ 1301.019951]  [<ffffff[ 1301.020924]  [ 1301.021895]  [<ffffff[ 1301.022861]  [<ffffff[ 1301.023820]  [<ffffff[ 1301.024776]  [<ffffff[ 1301.025726]  [ 1301.026679]  [<ffffff[ 1301.027626]  [<ffffff[ 1301.028585]  [<ffffff[ 1301.029544]  [ 1301.030502]  [<ffffff[ 1301.031463]  [<ffffff[ 1301.032430]  [<ffffff[ 1301.033391]  [<ffffff[ 1301.034351]  [ 1301.035305]  [<ffffff[ 1301.036264]  [<ffffff[ 1301.037218]  [ 1301.038171]  [<ffffff[ 1301.039122]  [<ffffff[ 1301.040069]  [<ffffff[ 1301.041018]  [ 1301.041965]  [<ffffff[ 1301.042916]  [<ffffff[ 1301.043880]  [<ffffff[ 1301.044842]  [<ffffff[ 1301.045808]  [ 1301.046780]  [<ffffff[ 1301.047737]  [<ffffff[ 1301.048705]  [<ffffff[ 1301.049669]  [ 1301.050632]  [<ffffff[ 1301.051594]  [<ffffff[ 1301.052557]  [<ffffff[ 1301.053517]  [<ffffff[ 1301.054476]  [ 1301.055430]  [<ffffff[ 1301.056392]  [<ffffff[ 1301.057361]  [<ffffff[ 1301.058329]  [ 1301.059303]  [<ffffff[ 1301.060281]  [<ffffff[ 1301.061247]  [<ffffff[ 1301.062217]  [<ffffff[ 1301.063180]  [ 1301.064146]  [<ffffff[ 1301.065110]  [<ffffff[ 1301.066069]  [<ffffff[ 1301.067028]  [<ffffff[ 1301.067982]  [ 1301.068944]  [<ffffff[ 1301.069899]  [<ffffff[ 1301.070860]  [<ffffff[ 1301.071834]  [ 1301.072808]  [<ffffff[ 1301.073781]  [<ffffff[ 1301.074761]  [<ffffff[ 1301.075732]  [<ffffff[ 1301.0[ 1301.0[ 1301.092183]  [<ffffff[ 1301.093150]  [ 1301.094113]  [<ffffff[ 1301.095081]  [<ffffff[ 1301.096050]  [<ffffff[ 1301.097011]  [<ffffff[ 1301.097975]  [ 1301.098944]  [<ffffff[ 1301.099917]  [<ffffff[ 1301.100891]  [<ffffff[ 1301.101859]  [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov
> 
> and then total lockup.
> 
> 
> I saw some of Steven's patches get merged on Friday, is there anything else
> outstanding that didn't make it in yet that I could test ?
> Or is this another new bug ?

I have three fixes queued up at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent

Kind of hard to tell whether they are relevant given the interleaved
stack traces, but can't hurt to try them out.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 19:51 Another RCU trace. (3.10-rc5) Dave Jones
  2013-06-10 20:33 ` Paul E. McKenney
@ 2013-06-10 20:38 ` Steven Rostedt
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-10 20:38 UTC (permalink / raw)
  To: Dave Jones; +Cc: Paul E. McKenney, Linux Kernel, fweisbec

On Mon, 2013-06-10 at 15:51 -0400, Dave Jones wrote:
> Something in this code still isn't right in rc5.
> I got a pretty mangled trace (and then total lockup) pretty quickly..
> 
> [ 1300.705212] ===============================
> [ 1300.707514] [ INFO: suspicious RCU usage. ]
> [ 1300.709794] 3.10.0-rc5+ #6 Not tainted
> [ 1300.712064] -------------------------------
> [ 1300.714363] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 1300.717516] 
> [ 1300.717516] other info that might help us debug this:
> [ 1300.717516] 
> [ 1300.723450] 
> [ 1300.723450] RCU used illegally from idle CPU!
> [ 1300.723450] rcu_scheduler_active = 1, debug_locks = 0
> [ 1300.730124] RCU used illegally from extended quiescent state!
> [ 1300.732971] 2 locks held by kworker/0:0/12904:
> [ 1300.735395]  #0:  (&(&(&p->vtime_seqlock)->lock)->rlock){-.-.-.}, at: [<ffffffff8109203e>] vtime_account_system.part.1+0x1e/0x50
> [ 1300.740229]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5815>] cpuacct_account_field+0x5/0x200
> [ 1300.744305] 
> [ 1300.744305] stack backtrace:
> [ 1300.747817] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6 
> [ 1300.756486] ------------[ cut here ]------------
> [ 1300.759463] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.780868] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.783521] ------------[ cut here ]------------
> [ 1300.784848] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.792216] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.795211] ------------[ cut here ]------------
> [ 1300.796734] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.804965] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.808270] ------------[ cut here ]------------
> [ 1300.809921] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.811591] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.818924] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.822581] ------------[ cut here ]------------
> [ 1300.824419] WARNING: at kernel/sched/cputime.c:652 get_vtime_delta+0x64/0x80()
> [ 1300.826271] Modules linked in: tun rfcomm fuse bnep hidp ipt_ULOG nfnetlink scsi_transport_iscsi can_raw af_rxrpc rds nfc can_bcm netrom decnet rose af_key af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe caif_socket pppox caif ppp_generic slhc irda ipx p8023 psnap p8022 llc crc_ccitt atm ax25 phonet nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_hda_codec_realtek kvm crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e snd_seq_device usb_debug snd_pcm ptp snd_page_alloc pps_core snd_timer snd soundcore xfs libcrc32c
> [ 1300.834252] CPU: 0 PID: 12904 Comm: kworker/0:0 Not tainted 3.10.0-rc5+ #6
> [ 1300.838244] ------------[ cut here ]------------
> [ 1300.840231] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common+0x357/0x380()
> [ 1300.895079] WARNING: at kernel/sched/cputime.c:652 get_v[ 1300.8 af_802154 can appletalk x25 bluetooth llc2 rfkill pppoe nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel snd_h snd_hda_intel snd_hda_codec snd_hwdep snd_seq e1000e sn
> [ 1300.907729] CPU: 0 PID: 12904 Comm: kworker/0:0 Not
> [ 1300.914988]  ffffffff8104a1b1 000000000b89a53f ffff
> [ 1300.917385]  0000000000000008 ffffffff81101b57 ffff
> [ 1300.919760] Call Trace:
> [ 1300.922120]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 1300.924429]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 1300.926721]  [<ffffffff81101b57>] ? rcu_eqs_exit_common+0x357/0x380
> [ 1300.928993]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 1300.931243]  [<ffffffff81091b94>] get_vtime_delta+0x64/0x80
> [ 1300.933487]  [<ffffffff810922f9>] vtime_account_user+0x29/0x[ 1300.935702]  [<ffffffff81145285>] user_exit+0[ 1300.937882]  [<ffffffff816f0591>] do_page_fau[ 1300.940037]  [<ffffffff816ed162>] page_fault+0x22/0x3[ 1300.942158]  [<ffffffff81101b57>] ? rcu_eqs_e[ 1300.944270]  [<ffffffff81308fb9>] ? copy_user[ 1300.946365]  [<ffffffff8114b46c>] ? __probe_k[ 1300.948437]  [<ffffffff81072598>] print_worke[ 1300.950502]  [<ffffffff8104de32>] dump_stack_[ 1300.952543]  [<ffffffff816e39d2>] dum[ 1300.954546]  [<ffffffff8104a1b1>] warn_slowpa[ 1300.956525]  [<ffffffff8104a28a>] warn_slowpa[ 1300.958471]  [<ffffffff81101b57>] rcu[ 1300.960401]  [<ffffffff81091b94>] ? get_vtime[ 1300.962309]  [<ffffffff81101bda>] rcu[ 1300.964196]  [<ffffffff811039e3>] rcu_user_ex[ 1300.966071]  [<ffffffff81145275>] use[ 1300.967911]  [<ffffffff816f0591>] do_[ 1300.969729]  [<ffffffff816ed162>] pag[ 1300.971510]  [<ffffffff81091b94>] ? get_vtime[ 1300.973270]  [<ffffffff81308fb9>] ? c[ 1300.975014]  [ 1300.989735]  [ 1300.991250]  [<ffffffff816f05[ 1300.992759]  [<ffffffff816ed1[ 1300.994234]  [<ffffffff81101b57>] ? r[ 1300.995698]  [<ffffffff81308f[ 1300.997130]  [<ffffffff8114b4[ 1300.998534]  [<ffffffff810725[ 1300.999918]  [<ffffffff8104de[ 1301.001284]  [<ffffffff816e39[ 1301.002620]  [<ffffffff8104a1[ 1301.003955]  [<ffffff[ 1301.005255]  [<ffffffff81101b[ 1301.006533]  [<ffffffff81091b[ 1301.007780]  [<ffffff[ 1301.008997]  [<ffffffff811039[ 1301.010190]  [<ffffff[ 1301.011355]  [<ffffff[ 1301.012504]  [<ffffffff816ed1[ 1301.013642]  [<ffffff[ 1301.014758]  [<ffffff[ 1301.015852]  [<ffffff[ 1301.016913]  [<ffffff[ 1301.017948]  [<ffffff[ 1301.018965]  [<ffffff[ 1301.019951]  [<ffffff[ 1301.020924]  [ 1301.021895]  [<ffffff[ 1301.022861]  [<ffffff[ 1301.023820]  [<ffffff[ 1301.024776]  [<ffffff[ 1301.025726]  [ 1301.026679]  [<ffffff[ 1301.027626]  [<ffffff[ 1301.028585]  [<ffffff[ 1301.029544]  [ 1301.030502]  [<ffffff[ 1301.031463]  [<ffffff[ 1301.032430]  [<ffffff[ 1301.033391]  [<ffffff[ 1301.034351]  [ 1301.035305]  [<ffffff[ 1301.036264]  [<ffffff[ 1301.037218]  [ 1301.038171]  [<ffffff[ 1301.039122]  [<ffffff[ 1301.040069]  [<ffffff[ 1301.041018]  [ 1301.041965]  [<ffffff[ 1301.042916]  [<ffffff[ 1301.043880]  [<ffffff[ 1301.044842]  [<ffffff[ 1301.045808]  [ 1301.046780]  [<ffffff[ 1301.047737]  [<ffffff[ 1301.048705]  [<ffffff[ 1301.049669]  [ 1301.050632]  [<ffffff[ 1301.051594]  [<ffffff[ 1301.052557]  [<ffffff[ 1301.053517]  [<ffffff[ 1301.054476]  [ 1301.055430]  [<ffffff[ 1301.056392]  [<ffffff[ 1301.057361]  [<ffffff[ 1301.058329]  [ 1301.059303]  [<ffffff[ 1301.060281]  [<ffffff[ 1301.061247]  [<ffffff[ 1301.062217]  [<ffffff[ 1301.063180]  [ 1301.064146]  [<ffffff[ 1301.065110]  [<ffffff[ 1301.066069]  [<ffffff[ 1301.067028]  [<ffffff[ 1301.067982]  [ 1301.068944]  [<ffffff[ 1301.069899]  [<ffffff[ 1301.070860]  [<ffffff[ 1301.071834]  [ 1301.072808]  [<ffffff[ 1301.073781]  [<ffffff[ 1301.074761]  [<ffffff[ 1301.075732]  [<ffffff[ 1301.0[ 1301.0[ 1301.092183]  [<ffffff[ 1301.093150]  [ 1301.094113]  [<ffffff[ 1301.095081]  [<ffffff[ 1301.096050]  [<ffffff[ 1301.097011]  [<ffffff[ 1301.097975]  [ 1301.098944]  [<ffffff[ 1301.099917]  [<ffffff[ 1301.100891]  [<ffffff[ 1301.101859]  [<ffffff[ 1301.102834] -[ 1301.103835] Thread ov
> 
> and then total lockup.
> 
> 
> I saw some of Steven's patches get merged on Friday, is there anything else
> outstanding that didn't make it in yet that I could test ?
> Or is this another new bug ?

Looks to be another bug. Can you post the config you were using. Or is
this the same config as before?

Also, I don't see any tracing going on, so this could just be a context
tracking bug.

-- Steve



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 20:33 ` Paul E. McKenney
@ 2013-06-10 21:01   ` Dave Jones
  2013-06-10 21:16     ` Steven Rostedt
  2013-06-10 21:18     ` Paul E. McKenney
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Jones @ 2013-06-10 21:01 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel, fweisbec, rostedt

On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
 
 > > I saw some of Steven's patches get merged on Friday, is there anything else
 > > outstanding that didn't make it in yet that I could test ?
 > > Or is this another new bug ?
 > 
 > I have three fixes queued up at:
 > 
 > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
 > 
 > Kind of hard to tell whether they are relevant given the interleaved
 > stack traces, but can't hurt to try them out.
 
Here's another. Looks different.

[ 2739.921649] ===============================
[ 2739.923894] [ INFO: suspicious RCU usage. ]
[ 2739.926144] 3.10.0-rc5+ #6 Not tainted
[ 2739.928397] -------------------------------
[ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 2739.933826] 
other info that might help us debug this:

[ 2739.939663] 
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[ 2739.946345] RCU used illegally from extended quiescent state!
[ 2739.949123] 2 locks held by trinity-child1/4385:
[ 2739.951537]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
[ 2739.955316]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
[ 2739.959101] 
stack backtrace:
[ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
[ 2739.970870]  0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
[ 2739.974556]  ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
[ 2739.978353]  0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
[ 2739.982052] Call Trace:
[ 2739.984098]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 2739.986996]  [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
[ 2739.990080]  [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
[ 2739.992971]  [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
[ 2739.994716]  [<ffffffff8109609c>] update_curr+0xec/0x250
[ 2739.995873]  [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
[ 2739.997036]  [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
[ 2739.998192]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
[ 2739.999344]  [<ffffffff816eae94>] preempt_schedule+0x44/0x60
[ 2740.000479]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
[ 2740.001622]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
[ 2740.002760]  [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
[ 2740.003908]  [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
[ 2740.005046]  [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
[ 2740.006188]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
[ 2740.007325]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
[ 2740.008454]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
[ 2757.542667] ------------[ cut here ]------------
[ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
[ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
[ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
[ 2757.564356]  ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
[ 2757.566581]  ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
[ 2757.568813]  ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
[ 2757.571053] Call Trace:
[ 2757.572639]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
[ 2757.574545]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
[ 2757.576548]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
[ 2757.578526]  [<ffffffff811451f5>] user_enter+0xb5/0xd0
[ 2757.580424]  [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
[ 2757.582424]  [<ffffffff816ed036>] retint_kernel+0x26/0x30
[ 2757.584344]  [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
[ 2757.586330]  [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
[ 2757.588279]  [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
[ 2757.590237]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
[ 2757.592145]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.594156]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.596166]  [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
[ 2757.598129]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
[ 2757.600138]  [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
[ 2757.602127]  [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
[ 2757.604144]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.606242]  [<ffffffff810492ab>] do_fork+0xbb/0x470
[ 2757.608098]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
[ 2757.609994]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.612099]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
[ 2757.614196]  [<ffffffff81049686>] kernel_thread+0x26/0x30
[ 2757.616104]  [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
[ 2757.618037]  [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
[ 2757.619961]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
[ 2757.621959]  [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
[ 2757.623846]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
[ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:01   ` Dave Jones
@ 2013-06-10 21:16     ` Steven Rostedt
  2013-06-18  9:58       ` Peter Zijlstra
  2013-06-10 21:18     ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-06-10 21:16 UTC (permalink / raw)
  To: Dave Jones; +Cc: Paul E. McKenney, Linux Kernel, fweisbec, Peter Zijlstra

On Mon, 2013-06-10 at 17:01 -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
>  
>  > > I saw some of Steven's patches get merged on Friday, is there anything else
>  > > outstanding that didn't make it in yet that I could test ?
>  > > Or is this another new bug ?
>  > 
>  > I have three fixes queued up at:
>  > 
>  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>  > 
>  > Kind of hard to tell whether they are relevant given the interleaved
>  > stack traces, but can't hurt to try them out.
>  
> Here's another. Looks different.
> 
> [ 2739.921649] ===============================
> [ 2739.923894] [ INFO: suspicious RCU usage. ]
> [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> [ 2739.928397] -------------------------------
> [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 2739.933826] 
> other info that might help us debug this:
> 
> [ 2739.939663] 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 2739.946345] RCU used illegally from extended quiescent state!
> [ 2739.949123] 2 locks held by trinity-child1/4385:
> [ 2739.951537]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> [ 2739.955316]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> [ 2739.959101] 
> stack backtrace:
> [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> [ 2739.970870]  0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> [ 2739.974556]  ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> [ 2739.978353]  0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> [ 2739.982052] Call Trace:
> [ 2739.984098]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2739.986996]  [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> [ 2739.990080]  [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> [ 2739.992971]  [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> [ 2739.994716]  [<ffffffff8109609c>] update_curr+0xec/0x250
> [ 2739.995873]  [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> [ 2739.997036]  [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> [ 2739.998192]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2739.999344]  [<ffffffff816eae94>] preempt_schedule+0x44/0x60

Yeah, this one is fixed by a patch I sent out earlier, and I believe
Peter Zijlstra is going to push it. It wasn't part of my queue.

Peter, Are you going to take the preempt_schedule_context() patch?

-- Steve

> [ 2740.000479]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2740.001622]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2740.002760]  [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> [ 2740.003908]  [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> [ 2740.005046]  [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> [ 2740.006188]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.007325]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.008454]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2757.542667] ------------[ cut here ]------------
> [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> [ 2757.564356]  ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> [ 2757.566581]  ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> [ 2757.568813]  ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> [ 2757.571053] Call Trace:
> [ 2757.572639]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2757.574545]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 2757.576548]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 2757.578526]  [<ffffffff811451f5>] user_enter+0xb5/0xd0
> [ 2757.580424]  [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> [ 2757.582424]  [<ffffffff816ed036>] retint_kernel+0x26/0x30
> [ 2757.584344]  [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> [ 2757.586330]  [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> [ 2757.588279]  [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> [ 2757.590237]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.592145]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.594156]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.596166]  [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> [ 2757.598129]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.600138]  [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> [ 2757.602127]  [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> [ 2757.604144]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.606242]  [<ffffffff810492ab>] do_fork+0xbb/0x470
> [ 2757.608098]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.609994]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.612099]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.614196]  [<ffffffff81049686>] kernel_thread+0x26/0x30
> [ 2757.616104]  [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> [ 2757.618037]  [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> [ 2757.619961]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.621959]  [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> [ 2757.623846]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:01   ` Dave Jones
  2013-06-10 21:16     ` Steven Rostedt
@ 2013-06-10 21:18     ` Paul E. McKenney
  2013-06-10 21:37       ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-06-10 21:18 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, fweisbec, rostedt

On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> 
>  > > I saw some of Steven's patches get merged on Friday, is there anything else
>  > > outstanding that didn't make it in yet that I could test ?
>  > > Or is this another new bug ?
>  > 
>  > I have three fixes queued up at:
>  > 
>  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>  > 
>  > Kind of hard to tell whether they are relevant given the interleaved
>  > stack traces, but can't hurt to try them out.
> 
> Here's another. Looks different.

I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
called from irq) from the above git archive fixes this one.  Just don't
ask how much I am willing to bet.  ;-)

								Thanx, Paul

> [ 2739.921649] ===============================
> [ 2739.923894] [ INFO: suspicious RCU usage. ]
> [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> [ 2739.928397] -------------------------------
> [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> [ 2739.933826] 
> other info that might help us debug this:
> 
> [ 2739.939663] 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> [ 2739.946345] RCU used illegally from extended quiescent state!
> [ 2739.949123] 2 locks held by trinity-child1/4385:
> [ 2739.951537]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> [ 2739.955316]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> [ 2739.959101] 
> stack backtrace:
> [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> [ 2739.970870]  0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> [ 2739.974556]  ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> [ 2739.978353]  0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> [ 2739.982052] Call Trace:
> [ 2739.984098]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2739.986996]  [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> [ 2739.990080]  [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> [ 2739.992971]  [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> [ 2739.994716]  [<ffffffff8109609c>] update_curr+0xec/0x250
> [ 2739.995873]  [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> [ 2739.997036]  [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> [ 2739.998192]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2739.999344]  [<ffffffff816eae94>] preempt_schedule+0x44/0x60
> [ 2740.000479]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> [ 2740.001622]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2740.002760]  [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> [ 2740.003908]  [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> [ 2740.005046]  [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> [ 2740.006188]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.007325]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> [ 2740.008454]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> [ 2757.542667] ------------[ cut here ]------------
> [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> [ 2757.564356]  ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> [ 2757.566581]  ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> [ 2757.568813]  ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> [ 2757.571053] Call Trace:
> [ 2757.572639]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> [ 2757.574545]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> [ 2757.576548]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> [ 2757.578526]  [<ffffffff811451f5>] user_enter+0xb5/0xd0
> [ 2757.580424]  [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> [ 2757.582424]  [<ffffffff816ed036>] retint_kernel+0x26/0x30
> [ 2757.584344]  [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> [ 2757.586330]  [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> [ 2757.588279]  [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> [ 2757.590237]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.592145]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.594156]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.596166]  [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> [ 2757.598129]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> [ 2757.600138]  [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> [ 2757.602127]  [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> [ 2757.604144]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.606242]  [<ffffffff810492ab>] do_fork+0xbb/0x470
> [ 2757.608098]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> [ 2757.609994]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.612099]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> [ 2757.614196]  [<ffffffff81049686>] kernel_thread+0x26/0x30
> [ 2757.616104]  [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> [ 2757.618037]  [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> [ 2757.619961]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.621959]  [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> [ 2757.623846]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:18     ` Paul E. McKenney
@ 2013-06-10 21:37       ` Steven Rostedt
  2013-06-10 21:54         ` Dave Jones
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-06-10 21:37 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, fweisbec

On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
> On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
> > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> > 
> >  > > I saw some of Steven's patches get merged on Friday, is there anything else
> >  > > outstanding that didn't make it in yet that I could test ?
> >  > > Or is this another new bug ?
> >  > 
> >  > I have three fixes queued up at:
> >  > 
> >  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> >  > 
> >  > Kind of hard to tell whether they are relevant given the interleaved
> >  > stack traces, but can't hurt to try them out.
> > 
> > Here's another. Looks different.
> 
> I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
> called from irq) from the above git archive fixes this one.  Just don't
> ask how much I am willing to bet.  ;-)

Don't bet much ;-)  This has nothing to do with tracepoints. It's due to
the function tracer. And the perf function tracer at that, as that's the
only thing that currently uses "ftrace_ops_control_func" that's in the
backtrace below.

-- Steve

> 
> 								Thanx, Paul
> 
> > [ 2739.921649] ===============================
> > [ 2739.923894] [ INFO: suspicious RCU usage. ]
> > [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> > [ 2739.928397] -------------------------------
> > [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> > [ 2739.933826] 
> > other info that might help us debug this:
> > 
> > [ 2739.939663] 
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [ 2739.946345] RCU used illegally from extended quiescent state!
> > [ 2739.949123] 2 locks held by trinity-child1/4385:
> > [ 2739.951537]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> > [ 2739.955316]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> > [ 2739.959101] 
> > stack backtrace:
> > [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> > [ 2739.970870]  0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> > [ 2739.974556]  ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> > [ 2739.978353]  0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> > [ 2739.982052] Call Trace:
> > [ 2739.984098]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2739.986996]  [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> > [ 2739.990080]  [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> > [ 2739.992971]  [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> > [ 2739.994716]  [<ffffffff8109609c>] update_curr+0xec/0x250
> > [ 2739.995873]  [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> > [ 2739.997036]  [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> > [ 2739.998192]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2739.999344]  [<ffffffff816eae94>] preempt_schedule+0x44/0x60
> > [ 2740.000479]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2740.001622]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> > [ 2740.002760]  [<ffffffff8110d73f>] ftrace_ops_control_func+0x10f/0x140
> > [ 2740.003908]  [<ffffffff816f4a00>] ftrace_call+0x5/0x2f
> > [ 2740.005046]  [<ffffffff816ecf5d>] ? retint_careful+0xb/0x2e
> > [ 2740.006188]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> > [ 2740.007325]  [<ffffffff816eaf65>] ? schedule_user+0x5/0x70
> > [ 2740.008454]  [<ffffffff816ecf64>] ? retint_careful+0x12/0x2e
> > [ 2757.542667] ------------[ cut here ]------------
> > [ 2757.544693] WARNING: at kernel/context_tracking.c:55 user_enter+0xb5/0xd0()
> > [ 2757.546644] Modules linked in: snd_seq_dummy fuse hidp bnep rfcomm ipt_ULOG can_raw nfnetlink can_bcm can decnet scsi_transport_iscsi pppoe pppox ppp_generic slhc af_rxrpc caif_socket caif bluetooth x25 phonet ipx p8023 p8022 rds nfc rfkill netrom af_key irda appletalk psnap crc_ccitt llc2 llc rose ax25 atm af_802154 nfsv3 nfs_acl nfs lockd sunrpc fscache coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek microcode pcspkr snd_hda_codec_hdmi usb_debug snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
> > [ 2757.559111] CPU: 2 PID: 7910 Comm: kworker/u8:2 Not tainted 3.10.0-rc5+ #6
> > [ 2757.564356]  ffffffff81a18e2d ffff880200db3ba8 ffffffff816e39db ffff880200db3be0
> > [ 2757.566581]  ffffffff8104a1b1 0000000000000001 ffff880200db3fd8 00000000ffffffff
> > [ 2757.568813]  ffff8802434cc240 ffff880244dd9160 ffff880200db3bf0 ffffffff8104a28a
> > [ 2757.571053] Call Trace:
> > [ 2757.572639]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2757.574545]  [<ffffffff8104a1b1>] warn_slowpath_common+0x61/0x80
> > [ 2757.576548]  [<ffffffff8104a28a>] warn_slowpath_null+0x1a/0x20
> > [ 2757.578526]  [<ffffffff811451f5>] user_enter+0xb5/0xd0
> > [ 2757.580424]  [<ffffffff816eb0d5>] preempt_schedule_irq+0x85/0x90
> > [ 2757.582424]  [<ffffffff816ed036>] retint_kernel+0x26/0x30
> > [ 2757.584344]  [<ffffffff81199877>] ? deactivate_slab+0x627/0x740
> > [ 2757.586330]  [<ffffffff816e2957>] ? __slab_alloc+0x457/0x507
> > [ 2757.588279]  [<ffffffff816e295c>] ? __slab_alloc+0x45c/0x507
> > [ 2757.590237]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> > [ 2757.592145]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.594156]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.596166]  [<ffffffff8119cd97>] kmem_cache_alloc+0x257/0x320
> > [ 2757.598129]  [<ffffffff8110890e>] ? __delayacct_tsk_init+0x1e/0x50
> > [ 2757.600138]  [<ffffffff8110890e>] __delayacct_tsk_init+0x1e/0x50
> > [ 2757.602127]  [<ffffffff81048425>] copy_process.part.28+0x805/0x15a0
> > [ 2757.604144]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.606242]  [<ffffffff810492ab>] do_fork+0xbb/0x470
> > [ 2757.608098]  [<ffffffff816f4a00>] ? ftrace_call+0x5/0x2f
> > [ 2757.609994]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.612099]  [<ffffffff8106a1b0>] ? usermodehelper_read_lock_wait+0xc0/0xc0
> > [ 2757.614196]  [<ffffffff81049686>] kernel_thread+0x26/0x30
> > [ 2757.616104]  [<ffffffff8106a418>] wait_for_helper+0x68/0xa0
> > [ 2757.618037]  [<ffffffff8108b910>] ? schedule_tail+0x30/0xb0
> > [ 2757.619961]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> > [ 2757.621959]  [<ffffffff816f4d1c>] ret_from_fork+0x7c/0xb0
> > [ 2757.623846]  [<ffffffff8106a3b0>] ? __call_usermodehelper+0x90/0x90
> > [ 2757.625829] ---[ end trace 312da8c9d32e2f2c ]---
> > 
> > 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:37       ` Steven Rostedt
@ 2013-06-10 21:54         ` Dave Jones
  2013-06-10 22:10           ` Paul E. McKenney
  2013-06-10 22:10           ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Jones @ 2013-06-10 21:54 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: paulmck, Linux Kernel, fweisbec

On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
 > On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
 > > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
 > > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
 > > > 
 > > >  > > I saw some of Steven's patches get merged on Friday, is there anything else
 > > >  > > outstanding that didn't make it in yet that I could test ?
 > > >  > > Or is this another new bug ?
 > > >  > 
 > > >  > I have three fixes queued up at:
 > > >  > 
 > > >  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
 > > >  > 
 > > >  > Kind of hard to tell whether they are relevant given the interleaved
 > > >  > stack traces, but can't hurt to try them out.
 > > > 
 > > > Here's another. Looks different.
 > > 
 > > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
 > > called from irq) from the above git archive fixes this one.  Just don't
 > > ask how much I am willing to bet.  ;-)
 > 
 > Don't bet much ;-)  This has nothing to do with tracepoints. It's due to
 > the function tracer. 

dammit, 20 minutes after I finally cloned the repo.
Can we go back to posting diffs instead of hashes please ?

So while updating my list of bugs I've found this cycle, I noticed
I'd already posted this one a month ago on -rc2.
https://lkml.org/lkml/2013/5/21/327
which led us to this patch https://lkml.org/lkml/2013/5/24/379
After which I hit a bunch of what seem to be other RCU related bugs.

So maybe that patch was a winner after all and got dropped ?

	Dave
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:54         ` Dave Jones
@ 2013-06-10 22:10           ` Paul E. McKenney
  2013-06-10 22:25             ` Steven Rostedt
  2013-06-10 22:10           ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2013-06-10 22:10 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Linux Kernel, fweisbec

On Mon, Jun 10, 2013 at 05:54:21PM -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
>  > On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
>  > > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
>  > > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
>  > > > 
>  > > >  > > I saw some of Steven's patches get merged on Friday, is there anything else
>  > > >  > > outstanding that didn't make it in yet that I could test ?
>  > > >  > > Or is this another new bug ?
>  > > >  > 
>  > > >  > I have three fixes queued up at:
>  > > >  > 
>  > > >  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>  > > >  > 
>  > > >  > Kind of hard to tell whether they are relevant given the interleaved
>  > > >  > stack traces, but can't hurt to try them out.
>  > > > 
>  > > > Here's another. Looks different.
>  > > 
>  > > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
>  > > called from irq) from the above git archive fixes this one.  Just don't
>  > > ask how much I am willing to bet.  ;-)
>  > 
>  > Don't bet much ;-)  This has nothing to do with tracepoints. It's due to
>  > the function tracer. 
> 
> dammit, 20 minutes after I finally cloned the repo.
> Can we go back to posting diffs instead of hashes please ?
> 
> So while updating my list of bugs I've found this cycle, I noticed
> I'd already posted this one a month ago on -rc2.
> https://lkml.org/lkml/2013/5/21/327
> which led us to this patch https://lkml.org/lkml/2013/5/24/379
> After which I hit a bunch of what seem to be other RCU related bugs.
> 
> So maybe that patch was a winner after all and got dropped ?

Please see below for the patch that I was thinking of.

It has not been dropped, I was on travel and a bit slow about pushing
things.

							Thanx, Paul

------------------------------------------------------------------------

trace: Allow idle-safe tracepoints to be called from irq

__DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
may safely be invoked from what RCU considers to be an idle CPU.
However, these _rcuidle() tracepoints may -not- be invoked from the
handler of an irq taken from idle, because rcu_idle_enter() zeroes
RCU's nesting-level counter, so that the rcu_irq_exit() returning to
idle will trigger a WARN_ON_ONCE().

This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
tracepoints usable from irq handlers as well as from process context.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 2f322c3..f8e084d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
 				TP_CONDITION(cond),			\
-				rcu_idle_exit(),			\
-				rcu_idle_enter());			\
+				rcu_irq_enter(),			\
+				rcu_irq_exit());			\
 	}
 #else
 #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:54         ` Dave Jones
  2013-06-10 22:10           ` Paul E. McKenney
@ 2013-06-10 22:10           ` Steven Rostedt
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-10 22:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: paulmck, Linux Kernel, fweisbec, Peter Zijlstra

On Mon, 2013-06-10 at 17:54 -0400, Dave Jones wrote:
> On Mon, Jun 10, 2013 at 05:37:36PM -0400, Steven Rostedt wrote:
>  > On Mon, 2013-06-10 at 14:18 -0700, Paul E. McKenney wrote:
>  > > On Mon, Jun 10, 2013 at 05:01:23PM -0400, Dave Jones wrote:
>  > > > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
>  > > > 
>  > > >  > > I saw some of Steven's patches get merged on Friday, is there anything else
>  > > >  > > outstanding that didn't make it in yet that I could test ?
>  > > >  > > Or is this another new bug ?
>  > > >  > 
>  > > >  > I have three fixes queued up at:
>  > > >  > 
>  > > >  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
>  > > >  > 
>  > > >  > Kind of hard to tell whether they are relevant given the interleaved
>  > > >  > stack traces, but can't hurt to try them out.
>  > > > 
>  > > > Here's another. Looks different.
>  > > 
>  > > I bet that commit d6284099 (trace: Allow idle-safe tracepoints to be
>  > > called from irq) from the above git archive fixes this one.  Just don't
>  > > ask how much I am willing to bet.  ;-)
>  > 
>  > Don't bet much ;-)  This has nothing to do with tracepoints. It's due to
>  > the function tracer. 
> 
> dammit, 20 minutes after I finally cloned the repo.
> Can we go back to posting diffs instead of hashes please ?
> 
> So while updating my list of bugs I've found this cycle, I noticed
> I'd already posted this one a month ago on -rc2.
> https://lkml.org/lkml/2013/5/21/327
> which led us to this patch https://lkml.org/lkml/2013/5/24/379
> After which I hit a bunch of what seem to be other RCU related bugs.
> 
> So maybe that patch was a winner after all and got dropped ?

I don't think it got dropped, it just seemed that it would go through
Peter:

https://lkml.org/lkml/2013/6/6/456

-- Steve

> 
> 	Dave
>  



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 22:10           ` Paul E. McKenney
@ 2013-06-10 22:25             ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-06-10 22:25 UTC (permalink / raw)
  To: paulmck; +Cc: Dave Jones, Linux Kernel, fweisbec

On Mon, 2013-06-10 at 15:10 -0700, Paul E. McKenney wrote:

> > So maybe that patch was a winner after all and got dropped ?
> 
> Please see below for the patch that I was thinking of.
> 
> It has not been dropped, I was on travel and a bit slow about pushing
> things.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> trace: Allow idle-safe tracepoints to be called from irq
> 
> __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
> may safely be invoked from what RCU considers to be an idle CPU.
> However, these _rcuidle() tracepoints may -not- be invoked from the
> handler of an irq taken from idle, because rcu_idle_enter() zeroes
> RCU's nesting-level counter, so that the rcu_irq_exit() returning to
> idle will trigger a WARN_ON_ONCE().
> 
> This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
> and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
> tracepoints usable from irq handlers as well as from process context.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 2f322c3..f8e084d 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -145,8 +145,8 @@ static inline void tracepoint_synchronize_unregister(void)
>  				TP_PROTO(data_proto),			\
>  				TP_ARGS(data_args),			\
>  				TP_CONDITION(cond),			\
> -				rcu_idle_exit(),			\
> -				rcu_idle_enter());			\
> +				rcu_irq_enter(),			\
> +				rcu_irq_exit());			\

Still wont fix the splat that Dave saw. As the splat has nothing to do
with tracepoints, but has to do with function tracing, which is a whole
'nother beast.

-- Steve

>  	}
>  #else
>  #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-10 21:16     ` Steven Rostedt
@ 2013-06-18  9:58       ` Peter Zijlstra
  2013-06-18 14:40         ` Dave Jones
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2013-06-18  9:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dave Jones, Paul E. McKenney, Linux Kernel, fweisbec, mingo

On Mon, Jun 10, 2013 at 05:16:48PM -0400, Steven Rostedt wrote:
> On Mon, 2013-06-10 at 17:01 -0400, Dave Jones wrote:
> > On Mon, Jun 10, 2013 at 01:33:55PM -0700, Paul E. McKenney wrote:
> >  
> >  > > I saw some of Steven's patches get merged on Friday, is there anything else
> >  > > outstanding that didn't make it in yet that I could test ?
> >  > > Or is this another new bug ?
> >  > 
> >  > I have three fixes queued up at:
> >  > 
> >  > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/urgent
> >  > 
> >  > Kind of hard to tell whether they are relevant given the interleaved
> >  > stack traces, but can't hurt to try them out.
> >  
> > Here's another. Looks different.
> > 
> > [ 2739.921649] ===============================
> > [ 2739.923894] [ INFO: suspicious RCU usage. ]
> > [ 2739.926144] 3.10.0-rc5+ #6 Not tainted
> > [ 2739.928397] -------------------------------
> > [ 2739.930670] include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
> > [ 2739.933826] 
> > other info that might help us debug this:
> > 
> > [ 2739.939663] 
> > RCU used illegally from idle CPU!
> > rcu_scheduler_active = 1, debug_locks = 0
> > [ 2739.946345] RCU used illegally from extended quiescent state!
> > [ 2739.949123] 2 locks held by trinity-child1/4385:
> > [ 2739.951537]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816ea16f>] __schedule+0xef/0x9c0
> > [ 2739.955316]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810a5625>] cpuacct_charge+0x5/0x1f0
> > [ 2739.959101] 
> > stack backtrace:
> > [ 2739.962529] CPU: 1 PID: 4385 Comm: trinity-child1 Not tainted 3.10.0-rc5+ #6
> > [ 2739.970870]  0000000000000000 ffff8802247e3cf8 ffffffff816e39db ffff8802247e3d28
> > [ 2739.974556]  ffffffff810b5987 ffff880200f02568 000000000032585b ffff880200f02520
> > [ 2739.978353]  0000000000000001 ffff8802247e3d60 ffffffff810a57a5 ffffffff810a5625
> > [ 2739.982052] Call Trace:
> > [ 2739.984098]  [<ffffffff816e39db>] dump_stack+0x19/0x1b
> > [ 2739.986996]  [<ffffffff810b5987>] lockdep_rcu_suspicious+0xe7/0x120
> > [ 2739.990080]  [<ffffffff810a57a5>] cpuacct_charge+0x185/0x1f0
> > [ 2739.992971]  [<ffffffff810a5625>] ? cpuacct_charge+0x5/0x1f0
> > [ 2739.994716]  [<ffffffff8109609c>] update_curr+0xec/0x250
> > [ 2739.995873]  [<ffffffff810975c8>] put_prev_task_fair+0x228/0x480
> > [ 2739.997036]  [<ffffffff816ea1e6>] __schedule+0x166/0x9c0
> > [ 2739.998192]  [<ffffffff816eaf60>] ? __cond_resched_softirq+0x60/0x60
> > [ 2739.999344]  [<ffffffff816eae94>] preempt_schedule+0x44/0x60
> 
> Yeah, this one is fixed by a patch I sent out earlier, and I believe
> Peter Zijlstra is going to push it. It wasn't part of my queue.
> 
> Peter, Are you going to take the preempt_schedule_context() patch?

I have it queued, I just seem to have some problems locating Ingo to
stuff patches into -tip :/

Will continue prodding.. Ingo if you're reading! :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-18  9:58       ` Peter Zijlstra
@ 2013-06-18 14:40         ` Dave Jones
  2013-06-18 17:32           ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Jones @ 2013-06-18 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Paul E. McKenney, Linux Kernel, fweisbec, mingo,
	Thomas Gleixner

On Tue, Jun 18, 2013 at 11:58:21AM +0200, Peter Zijlstra wrote:

 > > Peter, Are you going to take the preempt_schedule_context() patch?
 > 
 > I have it queued, I just seem to have some problems locating Ingo to
 > stuff patches into -tip :/
 > 
 > Will continue prodding.. Ingo if you're reading! :-)

There's a bunch of other stuff in tip pending pushing to Linus too,
including a bunch of fixes in timers/urgent that have been there
over a month.

Ingo hasn't posted since the 3rd, maybe he's sick/on vacation.
Does anyone else do pushes from tip->Linus ?  tglx ?

	Dave


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Another RCU trace. (3.10-rc5)
  2013-06-18 14:40         ` Dave Jones
@ 2013-06-18 17:32           ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2013-06-18 17:32 UTC (permalink / raw)
  To: Dave Jones, Steven Rostedt, Paul E. McKenney, Linux Kernel,
	fweisbec, mingo, Thomas Gleixner

On Tue, Jun 18, 2013 at 10:40:20AM -0400, Dave Jones wrote:
> On Tue, Jun 18, 2013 at 11:58:21AM +0200, Peter Zijlstra wrote:
> 
>  > > Peter, Are you going to take the preempt_schedule_context() patch?
>  > 
>  > I have it queued, I just seem to have some problems locating Ingo to
>  > stuff patches into -tip :/
>  > 
>  > Will continue prodding.. Ingo if you're reading! :-)
> 
> There's a bunch of other stuff in tip pending pushing to Linus too,
> including a bunch of fixes in timers/urgent that have been there
> over a month.
> 
> Ingo hasn't posted since the 3rd, maybe he's sick/on vacation.
> Does anyone else do pushes from tip->Linus ?  tglx ?
> 

hpa and tglx are the other people with tip commit access. I'll try and
get a hold of tglx; he is currently at a conference but he should be
back the day after tomorrow.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-06-18 17:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-10 19:51 Another RCU trace. (3.10-rc5) Dave Jones
2013-06-10 20:33 ` Paul E. McKenney
2013-06-10 21:01   ` Dave Jones
2013-06-10 21:16     ` Steven Rostedt
2013-06-18  9:58       ` Peter Zijlstra
2013-06-18 14:40         ` Dave Jones
2013-06-18 17:32           ` Peter Zijlstra
2013-06-10 21:18     ` Paul E. McKenney
2013-06-10 21:37       ` Steven Rostedt
2013-06-10 21:54         ` Dave Jones
2013-06-10 22:10           ` Paul E. McKenney
2013-06-10 22:25             ` Steven Rostedt
2013-06-10 22:10           ` Steven Rostedt
2013-06-10 20:38 ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).