linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARN_ON() in lock_accesses()
@ 2015-05-02 10:49 Paul E. McKenney
  2015-05-02 18:13 ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2015-05-02 10:49 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel

Hello!

I got the following while testing Tiny RCU, so this is a UP system with
PREEMPT=n, but with my RCU stack:

[ 1774.636012] WARNING: CPU: 0 PID: 1 at /home/paulmck/public_git/linux-rcu/kernel/locking/lockdep.c:973 __bfs+0x207/0x280()
[ 1774.636012] Modules linked in:
[ 1774.636012] CPU: 0 PID: 1 Comm: init Not tainted 4.1.0-rc1+ #1
[ 1774.636012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1774.636012]  ffffffff81c72ee0 ffff88001e8c7818 ffffffff818eab91 ffff88001e8c7858
[ 1774.636012]  ffffffff8104916f ffff88001e8c7898 ffffffff838c60c0 0000000000000000
[ 1774.636012]  ffffffff81079f20 ffff88001e8c78e8 0000000000000000 ffff88001e8c7868
[ 1774.636012] Call Trace:
[ 1774.636012]  [<ffffffff818eab91>] dump_stack+0x19/0x1b
[ 1774.636012]  [<ffffffff8104916f>] warn_slowpath_common+0x7f/0xc0
[ 1774.636012]  [<ffffffff81079f20>] ? noop_count+0x10/0x10
[ 1774.636012]  [<ffffffff81049255>] warn_slowpath_null+0x15/0x20
[ 1774.636012]  [<ffffffff8107a737>] __bfs+0x207/0x280
[ 1774.636012]  [<ffffffff8107c002>] check_usage_backwards+0x72/0x130
[ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
[ 1774.636012]  [<ffffffff8107bf90>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
[ 1774.636012]  [<ffffffff8107ca72>] mark_lock+0x1c2/0x2c0
[ 1774.636012]  [<ffffffff8107d767>] __lock_acquire+0x6f7/0x1d50
[ 1774.636012]  [<ffffffff8107d97f>] ? __lock_acquire+0x90f/0x1d50
[ 1774.636012]  [<ffffffff8107d4d9>] ? __lock_acquire+0x469/0x1d50
[ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
[ 1774.636012]  [<ffffffff8107f654>] lock_acquire+0xa4/0x130
[ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
[ 1774.636012]  [<ffffffff811743d0>] ? select_collect+0xc0/0xc0
[ 1774.636012]  [<ffffffff818f630a>] _raw_spin_lock_nested+0x2a/0x40
[ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
[ 1774.636012]  [<ffffffff81176bd1>] d_walk+0xd1/0x4e0
[ 1774.636012]  [<ffffffff81177187>] ? d_invalidate+0xa7/0x100
[ 1774.636012]  [<ffffffff81174570>] ? __d_drop+0xb0/0xb0
[ 1774.636012]  [<ffffffff81177187>] d_invalidate+0xa7/0x100
[ 1774.636012]  [<ffffffff811ccdac>] proc_flush_task+0x9c/0x180
[ 1774.636012]  [<ffffffff810497a7>] release_task+0xa7/0x640
[ 1774.636012]  [<ffffffff81049714>] ? release_task+0x14/0x640
[ 1774.636012]  [<ffffffff8104b004>] wait_consider_task+0x804/0xec0
[ 1774.636012]  [<ffffffff8104b7c0>] ? do_wait+0x100/0x240
[ 1774.636012]  [<ffffffff8104b7c0>] do_wait+0x100/0x240
[ 1774.636012]  [<ffffffff8104bc53>] SyS_wait4+0x63/0xe0
[ 1774.636012]  [<ffffffff81049600>] ? task_stopped_code+0x60/0x60
[ 1774.636012]  [<ffffffff810bc9b7>] C_SYSC_wait4+0xc7/0xd0
[ 1774.636012]  [<ffffffff8107987e>] ? up_read+0x1e/0x40
[ 1774.636012]  [<ffffffff818f77ce>] ? retint_swapgs+0x11/0x16
[ 1774.636012]  [<ffffffff8107ccfd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 1774.636012]  [<ffffffff810bcab9>] compat_SyS_wait4+0x9/0x10
[ 1774.636012]  [<ffffffff8104433b>] sys32_waitpid+0xb/0x10
[ 1774.636012]  [<ffffffff818f88ad>] sysenter_dispatch+0x7/0x1f

Unsurprisingly, this is followed up by a NULL pointer dereference.

A quick look at the code suggests that someone might have gotten a
pointer to a held lock, then somehow that lock was released.  Which
does not seem at all likely.

Hints?

In the meantime, I will check for reproducibility.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARN_ON() in lock_accesses()
  2015-05-02 10:49 WARN_ON() in lock_accesses() Paul E. McKenney
@ 2015-05-02 18:13 ` Paul E. McKenney
  2015-05-03  7:53   ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2015-05-02 18:13 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel

On Sat, May 02, 2015 at 03:49:59AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> I got the following while testing Tiny RCU, so this is a UP system with
> PREEMPT=n, but with my RCU stack:
> 
> [ 1774.636012] WARNING: CPU: 0 PID: 1 at /home/paulmck/public_git/linux-rcu/kernel/locking/lockdep.c:973 __bfs+0x207/0x280()
> [ 1774.636012] Modules linked in:
> [ 1774.636012] CPU: 0 PID: 1 Comm: init Not tainted 4.1.0-rc1+ #1
> [ 1774.636012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [ 1774.636012]  ffffffff81c72ee0 ffff88001e8c7818 ffffffff818eab91 ffff88001e8c7858
> [ 1774.636012]  ffffffff8104916f ffff88001e8c7898 ffffffff838c60c0 0000000000000000
> [ 1774.636012]  ffffffff81079f20 ffff88001e8c78e8 0000000000000000 ffff88001e8c7868
> [ 1774.636012] Call Trace:
> [ 1774.636012]  [<ffffffff818eab91>] dump_stack+0x19/0x1b
> [ 1774.636012]  [<ffffffff8104916f>] warn_slowpath_common+0x7f/0xc0
> [ 1774.636012]  [<ffffffff81079f20>] ? noop_count+0x10/0x10
> [ 1774.636012]  [<ffffffff81049255>] warn_slowpath_null+0x15/0x20
> [ 1774.636012]  [<ffffffff8107a737>] __bfs+0x207/0x280
> [ 1774.636012]  [<ffffffff8107c002>] check_usage_backwards+0x72/0x130
> [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> [ 1774.636012]  [<ffffffff8107bf90>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
> [ 1774.636012]  [<ffffffff8107ca72>] mark_lock+0x1c2/0x2c0
> [ 1774.636012]  [<ffffffff8107d767>] __lock_acquire+0x6f7/0x1d50
> [ 1774.636012]  [<ffffffff8107d97f>] ? __lock_acquire+0x90f/0x1d50
> [ 1774.636012]  [<ffffffff8107d4d9>] ? __lock_acquire+0x469/0x1d50
> [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> [ 1774.636012]  [<ffffffff8107f654>] lock_acquire+0xa4/0x130
> [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> [ 1774.636012]  [<ffffffff811743d0>] ? select_collect+0xc0/0xc0
> [ 1774.636012]  [<ffffffff818f630a>] _raw_spin_lock_nested+0x2a/0x40
> [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> [ 1774.636012]  [<ffffffff81176bd1>] d_walk+0xd1/0x4e0
> [ 1774.636012]  [<ffffffff81177187>] ? d_invalidate+0xa7/0x100
> [ 1774.636012]  [<ffffffff81174570>] ? __d_drop+0xb0/0xb0
> [ 1774.636012]  [<ffffffff81177187>] d_invalidate+0xa7/0x100
> [ 1774.636012]  [<ffffffff811ccdac>] proc_flush_task+0x9c/0x180
> [ 1774.636012]  [<ffffffff810497a7>] release_task+0xa7/0x640
> [ 1774.636012]  [<ffffffff81049714>] ? release_task+0x14/0x640
> [ 1774.636012]  [<ffffffff8104b004>] wait_consider_task+0x804/0xec0
> [ 1774.636012]  [<ffffffff8104b7c0>] ? do_wait+0x100/0x240
> [ 1774.636012]  [<ffffffff8104b7c0>] do_wait+0x100/0x240
> [ 1774.636012]  [<ffffffff8104bc53>] SyS_wait4+0x63/0xe0
> [ 1774.636012]  [<ffffffff81049600>] ? task_stopped_code+0x60/0x60
> [ 1774.636012]  [<ffffffff810bc9b7>] C_SYSC_wait4+0xc7/0xd0
> [ 1774.636012]  [<ffffffff8107987e>] ? up_read+0x1e/0x40
> [ 1774.636012]  [<ffffffff818f77ce>] ? retint_swapgs+0x11/0x16
> [ 1774.636012]  [<ffffffff8107ccfd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [ 1774.636012]  [<ffffffff810bcab9>] compat_SyS_wait4+0x9/0x10
> [ 1774.636012]  [<ffffffff8104433b>] sys32_waitpid+0xb/0x10
> [ 1774.636012]  [<ffffffff818f88ad>] sysenter_dispatch+0x7/0x1f
> 
> Unsurprisingly, this is followed up by a NULL pointer dereference.
> 
> A quick look at the code suggests that someone might have gotten a
> pointer to a held lock, then somehow that lock was released.  Which
> does not seem at all likely.
> 
> Hints?
> 
> In the meantime, I will check for reproducibility.

With two failures thus far, MTBF at about 18 hours.  Hmmm...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARN_ON() in lock_accesses()
  2015-05-02 18:13 ` Paul E. McKenney
@ 2015-05-03  7:53   ` Paul E. McKenney
  2015-05-05 18:42     ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2015-05-03  7:53 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel

On Sat, May 02, 2015 at 11:13:43AM -0700, Paul E. McKenney wrote:
> On Sat, May 02, 2015 at 03:49:59AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > I got the following while testing Tiny RCU, so this is a UP system with
> > PREEMPT=n, but with my RCU stack:
> > 
> > [ 1774.636012] WARNING: CPU: 0 PID: 1 at /home/paulmck/public_git/linux-rcu/kernel/locking/lockdep.c:973 __bfs+0x207/0x280()
> > [ 1774.636012] Modules linked in:
> > [ 1774.636012] CPU: 0 PID: 1 Comm: init Not tainted 4.1.0-rc1+ #1
> > [ 1774.636012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 1774.636012]  ffffffff81c72ee0 ffff88001e8c7818 ffffffff818eab91 ffff88001e8c7858
> > [ 1774.636012]  ffffffff8104916f ffff88001e8c7898 ffffffff838c60c0 0000000000000000
> > [ 1774.636012]  ffffffff81079f20 ffff88001e8c78e8 0000000000000000 ffff88001e8c7868
> > [ 1774.636012] Call Trace:
> > [ 1774.636012]  [<ffffffff818eab91>] dump_stack+0x19/0x1b
> > [ 1774.636012]  [<ffffffff8104916f>] warn_slowpath_common+0x7f/0xc0
> > [ 1774.636012]  [<ffffffff81079f20>] ? noop_count+0x10/0x10
> > [ 1774.636012]  [<ffffffff81049255>] warn_slowpath_null+0x15/0x20
> > [ 1774.636012]  [<ffffffff8107a737>] __bfs+0x207/0x280
> > [ 1774.636012]  [<ffffffff8107c002>] check_usage_backwards+0x72/0x130
> > [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> > [ 1774.636012]  [<ffffffff8107bf90>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
> > [ 1774.636012]  [<ffffffff8107ca72>] mark_lock+0x1c2/0x2c0
> > [ 1774.636012]  [<ffffffff8107d767>] __lock_acquire+0x6f7/0x1d50
> > [ 1774.636012]  [<ffffffff8107d97f>] ? __lock_acquire+0x90f/0x1d50
> > [ 1774.636012]  [<ffffffff8107d4d9>] ? __lock_acquire+0x469/0x1d50
> > [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> > [ 1774.636012]  [<ffffffff8107f654>] lock_acquire+0xa4/0x130
> > [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> > [ 1774.636012]  [<ffffffff811743d0>] ? select_collect+0xc0/0xc0
> > [ 1774.636012]  [<ffffffff818f630a>] _raw_spin_lock_nested+0x2a/0x40
> > [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> > [ 1774.636012]  [<ffffffff81176bd1>] d_walk+0xd1/0x4e0
> > [ 1774.636012]  [<ffffffff81177187>] ? d_invalidate+0xa7/0x100
> > [ 1774.636012]  [<ffffffff81174570>] ? __d_drop+0xb0/0xb0
> > [ 1774.636012]  [<ffffffff81177187>] d_invalidate+0xa7/0x100
> > [ 1774.636012]  [<ffffffff811ccdac>] proc_flush_task+0x9c/0x180
> > [ 1774.636012]  [<ffffffff810497a7>] release_task+0xa7/0x640
> > [ 1774.636012]  [<ffffffff81049714>] ? release_task+0x14/0x640
> > [ 1774.636012]  [<ffffffff8104b004>] wait_consider_task+0x804/0xec0
> > [ 1774.636012]  [<ffffffff8104b7c0>] ? do_wait+0x100/0x240
> > [ 1774.636012]  [<ffffffff8104b7c0>] do_wait+0x100/0x240
> > [ 1774.636012]  [<ffffffff8104bc53>] SyS_wait4+0x63/0xe0
> > [ 1774.636012]  [<ffffffff81049600>] ? task_stopped_code+0x60/0x60
> > [ 1774.636012]  [<ffffffff810bc9b7>] C_SYSC_wait4+0xc7/0xd0
> > [ 1774.636012]  [<ffffffff8107987e>] ? up_read+0x1e/0x40
> > [ 1774.636012]  [<ffffffff818f77ce>] ? retint_swapgs+0x11/0x16
> > [ 1774.636012]  [<ffffffff8107ccfd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> > [ 1774.636012]  [<ffffffff810bcab9>] compat_SyS_wait4+0x9/0x10
> > [ 1774.636012]  [<ffffffff8104433b>] sys32_waitpid+0xb/0x10
> > [ 1774.636012]  [<ffffffff818f88ad>] sysenter_dispatch+0x7/0x1f
> > 
> > Unsurprisingly, this is followed up by a NULL pointer dereference.
> > 
> > A quick look at the code suggests that someone might have gotten a
> > pointer to a held lock, then somehow that lock was released.  Which
> > does not seem at all likely.
> > 
> > Hints?
> > 
> > In the meantime, I will check for reproducibility.
> 
> With two failures thus far, MTBF at about 18 hours.  Hmmm...

Currently looks like something my for-4.2 commits introduced.  Sorry for the
bother, tracking it down.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: WARN_ON() in lock_accesses()
  2015-05-03  7:53   ` Paul E. McKenney
@ 2015-05-05 18:42     ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2015-05-05 18:42 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel

On Sun, May 03, 2015 at 12:53:16AM -0700, Paul E. McKenney wrote:
> On Sat, May 02, 2015 at 11:13:43AM -0700, Paul E. McKenney wrote:
> > On Sat, May 02, 2015 at 03:49:59AM -0700, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I got the following while testing Tiny RCU, so this is a UP system with
> > > PREEMPT=n, but with my RCU stack:
> > > 
> > > [ 1774.636012] WARNING: CPU: 0 PID: 1 at /home/paulmck/public_git/linux-rcu/kernel/locking/lockdep.c:973 __bfs+0x207/0x280()
> > > [ 1774.636012] Modules linked in:
> > > [ 1774.636012] CPU: 0 PID: 1 Comm: init Not tainted 4.1.0-rc1+ #1
> > > [ 1774.636012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > > [ 1774.636012]  ffffffff81c72ee0 ffff88001e8c7818 ffffffff818eab91 ffff88001e8c7858
> > > [ 1774.636012]  ffffffff8104916f ffff88001e8c7898 ffffffff838c60c0 0000000000000000
> > > [ 1774.636012]  ffffffff81079f20 ffff88001e8c78e8 0000000000000000 ffff88001e8c7868
> > > [ 1774.636012] Call Trace:
> > > [ 1774.636012]  [<ffffffff818eab91>] dump_stack+0x19/0x1b
> > > [ 1774.636012]  [<ffffffff8104916f>] warn_slowpath_common+0x7f/0xc0
> > > [ 1774.636012]  [<ffffffff81079f20>] ? noop_count+0x10/0x10
> > > [ 1774.636012]  [<ffffffff81049255>] warn_slowpath_null+0x15/0x20
> > > [ 1774.636012]  [<ffffffff8107a737>] __bfs+0x207/0x280
> > > [ 1774.636012]  [<ffffffff8107c002>] check_usage_backwards+0x72/0x130
> > > [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> > > [ 1774.636012]  [<ffffffff8107bf90>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
> > > [ 1774.636012]  [<ffffffff8107ca72>] mark_lock+0x1c2/0x2c0
> > > [ 1774.636012]  [<ffffffff8107d767>] __lock_acquire+0x6f7/0x1d50
> > > [ 1774.636012]  [<ffffffff8107d97f>] ? __lock_acquire+0x90f/0x1d50
> > > [ 1774.636012]  [<ffffffff8107d4d9>] ? __lock_acquire+0x469/0x1d50
> > > [ 1774.636012]  [<ffffffff8107d9ac>] ? __lock_acquire+0x93c/0x1d50
> > > [ 1774.636012]  [<ffffffff8107f654>] lock_acquire+0xa4/0x130
> > > [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> > > [ 1774.636012]  [<ffffffff811743d0>] ? select_collect+0xc0/0xc0
> > > [ 1774.636012]  [<ffffffff818f630a>] _raw_spin_lock_nested+0x2a/0x40
> > > [ 1774.636012]  [<ffffffff81176bd1>] ? d_walk+0xd1/0x4e0
> > > [ 1774.636012]  [<ffffffff81176bd1>] d_walk+0xd1/0x4e0
> > > [ 1774.636012]  [<ffffffff81177187>] ? d_invalidate+0xa7/0x100
> > > [ 1774.636012]  [<ffffffff81174570>] ? __d_drop+0xb0/0xb0
> > > [ 1774.636012]  [<ffffffff81177187>] d_invalidate+0xa7/0x100
> > > [ 1774.636012]  [<ffffffff811ccdac>] proc_flush_task+0x9c/0x180
> > > [ 1774.636012]  [<ffffffff810497a7>] release_task+0xa7/0x640
> > > [ 1774.636012]  [<ffffffff81049714>] ? release_task+0x14/0x640
> > > [ 1774.636012]  [<ffffffff8104b004>] wait_consider_task+0x804/0xec0
> > > [ 1774.636012]  [<ffffffff8104b7c0>] ? do_wait+0x100/0x240
> > > [ 1774.636012]  [<ffffffff8104b7c0>] do_wait+0x100/0x240
> > > [ 1774.636012]  [<ffffffff8104bc53>] SyS_wait4+0x63/0xe0
> > > [ 1774.636012]  [<ffffffff81049600>] ? task_stopped_code+0x60/0x60
> > > [ 1774.636012]  [<ffffffff810bc9b7>] C_SYSC_wait4+0xc7/0xd0
> > > [ 1774.636012]  [<ffffffff8107987e>] ? up_read+0x1e/0x40
> > > [ 1774.636012]  [<ffffffff818f77ce>] ? retint_swapgs+0x11/0x16
> > > [ 1774.636012]  [<ffffffff8107ccfd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> > > [ 1774.636012]  [<ffffffff810bcab9>] compat_SyS_wait4+0x9/0x10
> > > [ 1774.636012]  [<ffffffff8104433b>] sys32_waitpid+0xb/0x10
> > > [ 1774.636012]  [<ffffffff818f88ad>] sysenter_dispatch+0x7/0x1f
> > > 
> > > Unsurprisingly, this is followed up by a NULL pointer dereference.
> > > 
> > > A quick look at the code suggests that someone might have gotten a
> > > pointer to a held lock, then somehow that lock was released.  Which
> > > does not seem at all likely.
> > > 
> > > Hints?
> > > 
> > > In the meantime, I will check for reproducibility.
> > 
> > With two failures thus far, MTBF at about 18 hours.  Hmmm...
> 
> Currently looks like something my for-4.2 commits introduced.  Sorry for the
> bother, tracking it down.

And this bisected to a commit that forced rcutorture to use Tiny RCU's
RCU-bh.  So the next step is to backport this patch and see what happens.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-05 18:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-02 10:49 WARN_ON() in lock_accesses() Paul E. McKenney
2015-05-02 18:13 ` Paul E. McKenney
2015-05-03  7:53   ` Paul E. McKenney
2015-05-05 18:42     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).