All of lore.kernel.org
 help / color / mirror / Atom feed
* Traceback in ww_mutex test (test_cycle_work) on arm64/x86_64
@ 2018-09-23 19:57 Guenter Roeck
  2018-09-24 10:51 ` Will Deacon
  0 siblings, 1 reply; 3+ messages in thread
From: Guenter Roeck @ 2018-09-23 19:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Will Deacon

Hi,

when enabling CONFIG_WW_MUTEX_SELFTEST on arm64 or x86_64,
I get the following traceback.

[    3.111852] ------------[ cut here ]------------
[    3.112100] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
[    3.112753] WARNING: CPU: 1 PID: 771 at kernel/locking/mutex.c:1211 __mutex_unlock_slowpath+0x1a8/0x2e0
[    3.113238] Modules linked in:
[    3.113774] CPU: 1 PID: 771 Comm: kworker/u16:8 Not tainted 4.19.0-rc5-dirty #1
[    3.114025] Hardware name: linux,dummy-virt (DT)
[    3.114587] Workqueue: test-ww_mutex test_cycle_work
[    3.114950] pstate: 40000005 (nZcv daif -PAN -UAO)
[    3.115144] pc : __mutex_unlock_slowpath+0x1a8/0x2e0
[    3.115327] lr : __mutex_unlock_slowpath+0x1a8/0x2e0
[    3.115500] sp : ffff00000b7cbc40
[    3.115647] x29: ffff00000b7cbc40 x28: 0000000000000000 
[    3.115921] x27: ffff00000942f000 x26: ffff00000a204da0 
[    3.116155] x25: ffff00000a1c93d0 x24: ffff000009103cd8 
[    3.116376] x23: ffff00000a1c9000 x22: ffff00000942f000 
[    3.116596] x21: ffff00000b7cbca8 x20: ffff80001c05f8c8 
[    3.116817] x19: 0000000000000000 x18: ffffffffffffffff 
[    3.117036] x17: 0000000000000000 x16: 0000000000000000 
[    3.117256] x15: ffff00000942f808 x14: ffff00008a1c8bb7 
[    3.117476] x13: ffff00000a1c8bc5 x12: ffff00000944f000 
[    3.117695] x11: 0000000005f5e0ff x10: ffff0000094b3000 
[    3.117947] x9 : 0000000000000000 x8 : ffff00000942f808 
[    3.118172] x7 : ffff00000816153c x6 : 0000000000000000 
[    3.118392] x5 : 0000000000000000 x4 : ffff00000b7cc000 
[    3.118612] x3 : 6172e063a21fe200 x2 : ffff00000944fd80 
[    3.118830] x1 : 6172e063a21fe200 x0 : 0000000000000000 
[    3.119169] Call trace:
[    3.119348]  __mutex_unlock_slowpath+0x1a8/0x2e0
[    3.119540]  ww_mutex_unlock+0x48/0xa0
[    3.119709]  test_cycle_work+0x10c/0x220
[    3.119864]  process_one_work+0x29c/0x708
[    3.120016]  worker_thread+0x40/0x458
[    3.120179]  kthread+0x12c/0x130
[    3.120317]  ret_from_fork+0x10/0x18

Debugging shows that the traceback occurs in the following code
in test_cycle_work().

+       err = ww_mutex_lock(cycle->b_mutex, &ctx);
+       if (err == -EDEADLK) {
				# true
+               ww_mutex_unlock(&cycle->a_mutex);
+               ww_mutex_lock_slow(cycle->b_mutex, &ctx);
+               err = ww_mutex_lock(&cycle->a_mutex, &ctx);
				# returns with err == -EDEADLK
+       }
+
+       if (!err)
+               ww_mutex_unlock(cycle->b_mutex);
+       ww_mutex_unlock(&cycle->a_mutex);
				# traceback seen here:
				# unlocks a_mutex even though it was not
				# acquired by this thread

The problem is quite easy to reproduce with the following qemu command.

qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -monitor none \
	-kernel arch/arm64/boot/Image -no-reboot -smp 8 -m 512
	-device virtio-blk-pci,drive=d0 \
	-drive file=rootfs.ext2,if=none,id=d0,format=raw \
	-append 'console=ttyAMA0 root=/dev/vda rw'

or:

qemu-system-x86_64 \
	-kernel arch/x86/boot/bzImage \
	-M q35 \
	-cpu Skylake-Server \
	-no-reboot -smp 8 -m 1G \
	-usb -device usb-storage,drive=d0 \
	-drive file=rootfs.ext2,if=none,id=d0,format=raw \
	--append 'root=/dev/sda rw rootwait console=ttyS0 console=tty' \
	-nographic

Details don't really matter as long as the number of CPUs is at least 8
(I have not seen the problem with 1, 2, 4, or 6 CPUs). My test system
has 8 CPU cores (times 2 for hyperthreading), so that may be related.

The test case above is clearly wrong if both calls to ww_mutex_lock()
fail with -EDEADLK. Unfortunately I don't know the expected behavior
in this case, so I'll have to pass this on without a proposed fix.

Please let me know if there is anything I can do to help fixing
the problem.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Traceback in ww_mutex test (test_cycle_work) on arm64/x86_64
  2018-09-23 19:57 Traceback in ww_mutex test (test_cycle_work) on arm64/x86_64 Guenter Roeck
@ 2018-09-24 10:51 ` Will Deacon
  2018-10-02 21:53   ` Guenter Roeck
  0 siblings, 1 reply; 3+ messages in thread
From: Will Deacon @ 2018-09-24 10:51 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Chris Wilson, linux-kernel, Peter Zijlstra, Ingo Molnar

Hi Guenter,

On Sun, Sep 23, 2018 at 12:57:06PM -0700, Guenter Roeck wrote:
> when enabling CONFIG_WW_MUTEX_SELFTEST on arm64 or x86_64,
> I get the following traceback.
> 
> [    3.111852] ------------[ cut here ]------------
> [    3.112100] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> [    3.112753] WARNING: CPU: 1 PID: 771 at kernel/locking/mutex.c:1211 __mutex_unlock_slowpath+0x1a8/0x2e0
> [    3.113238] Modules linked in:
> [    3.113774] CPU: 1 PID: 771 Comm: kworker/u16:8 Not tainted 4.19.0-rc5-dirty #1
> [    3.114025] Hardware name: linux,dummy-virt (DT)
> [    3.114587] Workqueue: test-ww_mutex test_cycle_work
> [    3.114950] pstate: 40000005 (nZcv daif -PAN -UAO)
> [    3.115144] pc : __mutex_unlock_slowpath+0x1a8/0x2e0
> [    3.115327] lr : __mutex_unlock_slowpath+0x1a8/0x2e0
> [    3.115500] sp : ffff00000b7cbc40
> [    3.115647] x29: ffff00000b7cbc40 x28: 0000000000000000 
> [    3.115921] x27: ffff00000942f000 x26: ffff00000a204da0 
> [    3.116155] x25: ffff00000a1c93d0 x24: ffff000009103cd8 
> [    3.116376] x23: ffff00000a1c9000 x22: ffff00000942f000 
> [    3.116596] x21: ffff00000b7cbca8 x20: ffff80001c05f8c8 
> [    3.116817] x19: 0000000000000000 x18: ffffffffffffffff 
> [    3.117036] x17: 0000000000000000 x16: 0000000000000000 
> [    3.117256] x15: ffff00000942f808 x14: ffff00008a1c8bb7 
> [    3.117476] x13: ffff00000a1c8bc5 x12: ffff00000944f000 
> [    3.117695] x11: 0000000005f5e0ff x10: ffff0000094b3000 
> [    3.117947] x9 : 0000000000000000 x8 : ffff00000942f808 
> [    3.118172] x7 : ffff00000816153c x6 : 0000000000000000 
> [    3.118392] x5 : 0000000000000000 x4 : ffff00000b7cc000 
> [    3.118612] x3 : 6172e063a21fe200 x2 : ffff00000944fd80 
> [    3.118830] x1 : 6172e063a21fe200 x0 : 0000000000000000 
> [    3.119169] Call trace:
> [    3.119348]  __mutex_unlock_slowpath+0x1a8/0x2e0
> [    3.119540]  ww_mutex_unlock+0x48/0xa0
> [    3.119709]  test_cycle_work+0x10c/0x220
> [    3.119864]  process_one_work+0x29c/0x708
> [    3.120016]  worker_thread+0x40/0x458
> [    3.120179]  kthread+0x12c/0x130
> [    3.120317]  ret_from_fork+0x10/0x18

Fun: I can reproduce this all the way back to 4.11, when the selftests
were merged!

> Debugging shows that the traceback occurs in the following code
> in test_cycle_work().
> 
> +       err = ww_mutex_lock(cycle->b_mutex, &ctx);
> +       if (err == -EDEADLK) {
> 				# true
> +               ww_mutex_unlock(&cycle->a_mutex);
> +               ww_mutex_lock_slow(cycle->b_mutex, &ctx);
> +               err = ww_mutex_lock(&cycle->a_mutex, &ctx);
> 				# returns with err == -EDEADLK
> +       }
> +
> +       if (!err)
> +               ww_mutex_unlock(cycle->b_mutex);
> +       ww_mutex_unlock(&cycle->a_mutex);
> 				# traceback seen here:
> 				# unlocks a_mutex even though it was not
> 				# acquired by this thread
> 
> Details don't really matter as long as the number of CPUs is at least 8
> (I have not seen the problem with 1, 2, 4, or 6 CPUs). My test system
> has 8 CPU cores (times 2 for hyperthreading), so that may be related.
> 
> The test case above is clearly wrong if both calls to ww_mutex_lock()
> fail with -EDEADLK. Unfortunately I don't know the expected behavior
> in this case, so I'll have to pass this on without a proposed fix.

Yeah, I think the test code isn't robust in the face of
CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which can spuriously return -EDEADLK
from mutex_lock(). It looks like it's assuming that err will always be
reset to 0 when it takes a_mutex the second time. Chris?

Will

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Traceback in ww_mutex test (test_cycle_work) on arm64/x86_64
  2018-09-24 10:51 ` Will Deacon
@ 2018-10-02 21:53   ` Guenter Roeck
  0 siblings, 0 replies; 3+ messages in thread
From: Guenter Roeck @ 2018-10-02 21:53 UTC (permalink / raw)
  To: Will Deacon; +Cc: Chris Wilson, linux-kernel, Peter Zijlstra, Ingo Molnar

Hi Will,

On Mon, Sep 24, 2018 at 11:51:33AM +0100, Will Deacon wrote:
> Hi Guenter,
> 
> On Sun, Sep 23, 2018 at 12:57:06PM -0700, Guenter Roeck wrote:
> > when enabling CONFIG_WW_MUTEX_SELFTEST on arm64 or x86_64,
> > I get the following traceback.
> > 
> > [    3.111852] ------------[ cut here ]------------
> > [    3.112100] DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
> > [    3.112753] WARNING: CPU: 1 PID: 771 at kernel/locking/mutex.c:1211 __mutex_unlock_slowpath+0x1a8/0x2e0
> > [    3.113238] Modules linked in:
> > [    3.113774] CPU: 1 PID: 771 Comm: kworker/u16:8 Not tainted 4.19.0-rc5-dirty #1
> > [    3.114025] Hardware name: linux,dummy-virt (DT)
> > [    3.114587] Workqueue: test-ww_mutex test_cycle_work
> > [    3.114950] pstate: 40000005 (nZcv daif -PAN -UAO)
> > [    3.115144] pc : __mutex_unlock_slowpath+0x1a8/0x2e0
> > [    3.115327] lr : __mutex_unlock_slowpath+0x1a8/0x2e0
> > [    3.115500] sp : ffff00000b7cbc40
> > [    3.115647] x29: ffff00000b7cbc40 x28: 0000000000000000 
> > [    3.115921] x27: ffff00000942f000 x26: ffff00000a204da0 
> > [    3.116155] x25: ffff00000a1c93d0 x24: ffff000009103cd8 
> > [    3.116376] x23: ffff00000a1c9000 x22: ffff00000942f000 
> > [    3.116596] x21: ffff00000b7cbca8 x20: ffff80001c05f8c8 
> > [    3.116817] x19: 0000000000000000 x18: ffffffffffffffff 
> > [    3.117036] x17: 0000000000000000 x16: 0000000000000000 
> > [    3.117256] x15: ffff00000942f808 x14: ffff00008a1c8bb7 
> > [    3.117476] x13: ffff00000a1c8bc5 x12: ffff00000944f000 
> > [    3.117695] x11: 0000000005f5e0ff x10: ffff0000094b3000 
> > [    3.117947] x9 : 0000000000000000 x8 : ffff00000942f808 
> > [    3.118172] x7 : ffff00000816153c x6 : 0000000000000000 
> > [    3.118392] x5 : 0000000000000000 x4 : ffff00000b7cc000 
> > [    3.118612] x3 : 6172e063a21fe200 x2 : ffff00000944fd80 
> > [    3.118830] x1 : 6172e063a21fe200 x0 : 0000000000000000 
> > [    3.119169] Call trace:
> > [    3.119348]  __mutex_unlock_slowpath+0x1a8/0x2e0
> > [    3.119540]  ww_mutex_unlock+0x48/0xa0
> > [    3.119709]  test_cycle_work+0x10c/0x220
> > [    3.119864]  process_one_work+0x29c/0x708
> > [    3.120016]  worker_thread+0x40/0x458
> > [    3.120179]  kthread+0x12c/0x130
> > [    3.120317]  ret_from_fork+0x10/0x18
> 
> Fun: I can reproduce this all the way back to 4.11, when the selftests
> were merged!
> 
Yes, I know. Sorry, I should have mentioned it.

> > Debugging shows that the traceback occurs in the following code
> > in test_cycle_work().
> > 
> > +       err = ww_mutex_lock(cycle->b_mutex, &ctx);
> > +       if (err == -EDEADLK) {
> > 				# true
> > +               ww_mutex_unlock(&cycle->a_mutex);
> > +               ww_mutex_lock_slow(cycle->b_mutex, &ctx);
> > +               err = ww_mutex_lock(&cycle->a_mutex, &ctx);
> > 				# returns with err == -EDEADLK
> > +       }
> > +
> > +       if (!err)
> > +               ww_mutex_unlock(cycle->b_mutex);
> > +       ww_mutex_unlock(&cycle->a_mutex);
> > 				# traceback seen here:
> > 				# unlocks a_mutex even though it was not
> > 				# acquired by this thread
> > 
> > Details don't really matter as long as the number of CPUs is at least 8
> > (I have not seen the problem with 1, 2, 4, or 6 CPUs). My test system
> > has 8 CPU cores (times 2 for hyperthreading), so that may be related.
> > 
> > The test case above is clearly wrong if both calls to ww_mutex_lock()
> > fail with -EDEADLK. Unfortunately I don't know the expected behavior
> > in this case, so I'll have to pass this on without a proposed fix.
> 
> Yeah, I think the test code isn't robust in the face of
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which can spuriously return -EDEADLK
> from mutex_lock(). It looks like it's assuming that err will always be
> reset to 0 when it takes a_mutex the second time. Chris?
> 

I just sent a patch to fix the kernel warning message. That doesn't fix
the test failure, but then I have no idea if this is a real test failure
or a bad test program.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-02 21:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-23 19:57 Traceback in ww_mutex test (test_cycle_work) on arm64/x86_64 Guenter Roeck
2018-09-24 10:51 ` Will Deacon
2018-10-02 21:53   ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.