xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Xen Linux deadlock
@ 2017-06-07 15:05 Andre Przywara
  2017-06-07 15:51 ` Juergen Gross
  0 siblings, 1 reply; 2+ messages in thread
From: Andre Przywara @ 2017-06-07 15:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Julien Grall, Stefano Stabellini, Boris Ostrovsky

Hi,

when booting Linux 4.12-rc4 as Dom0 under a recent Xen HV I saw the
following lockdep splat after running xencommons start:

root@junor1:~# bash /etc/init.d/xencommons start
Setting domain 0 name, domid and JSON config...
[  247.979498] ======================================================
[  247.985688] WARNING: possible circular locking dependency detected
[  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[  247.997040] ------------------------------------------------------
[  248.003232] xenbus/91 is trying to acquire lock:
[  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
xenbus_dev_queue_reply+0x3c/0x230
[  248.017163]
[  248.017163] but task is already holding lock:
[  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.031267]
[  248.031267] which lock already depends on the new lock.
[  248.031267]
[  248.039615]
[  248.039615] the existing dependency chain (in reverse order) is:
[  248.047176]
[  248.047176] -> #1 (xb_write_mutex){+.+...}:
[  248.052943]        __lock_acquire+0x1728/0x1778
[  248.057498]        lock_acquire+0xc4/0x288
[  248.061630]        __mutex_lock+0x84/0x868
[  248.065755]        mutex_lock_nested+0x3c/0x50
[  248.070227]        xs_send+0x164/0x1f8
[  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
[  248.079427]        xenbus_file_write+0x260/0x420
[  248.084073]        __vfs_write+0x48/0x138
[  248.088113]        vfs_write+0xa8/0x1b8
[  248.091983]        SyS_write+0x54/0xb0
[  248.095768]        el0_svc_naked+0x24/0x28
[  248.099897]
[  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[  248.106088]        print_circular_bug+0x80/0x2e0
[  248.110730]        __lock_acquire+0x1768/0x1778
[  248.115288]        lock_acquire+0xc4/0x288
[  248.119417]        __mutex_lock+0x84/0x868
[  248.123545]        mutex_lock_nested+0x3c/0x50
[  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
[  248.133005]        xenbus_thread+0x788/0x798
[  248.137306]        kthread+0x110/0x140
[  248.141087]        ret_from_fork+0x10/0x40
[  248.145214]
[  248.145214] other info that might help us debug this:
[  248.145214]
[  248.153383]  Possible unsafe locking scenario:
[  248.153383]
[  248.159403]        CPU0                    CPU1
[  248.163960]        ----                    ----
[  248.168518]   lock(xb_write_mutex);
[  248.172045]                                lock(&u->msgbuffer_mutex);
[  248.178500]                                lock(xb_write_mutex);
[  248.184514]   lock(&u->msgbuffer_mutex);
[  248.188470]
[  248.188470]  *** DEADLOCK ***
[  248.188470]
[  248.194578] 2 locks held by xenbus/91:
[  248.198360]  #0:  (xs_response_mutex){+.+...}, at:
[<ffff00000863a7b0>] xenbus_thread+0x460/0x798
[  248.207218]  #1:  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.215818]
[  248.215818] stack backtrace:
[  248.220293] CPU: 0 PID: 91 Comm: xenbus Not tainted
4.12.0-rc4-00022-gc4b25c0 #575
[  248.227858] Hardware name: ARM Juno development board (r1) (DT)
[  248.233792] Call trace:
[  248.236289] [<ffff00000808a748>] dump_backtrace+0x0/0x270
[  248.241707] [<ffff00000808aa94>] show_stack+0x24/0x30
[  248.246782] [<ffff0000084caa98>] dump_stack+0xb8/0xf0
[  248.251859] [<ffff000008139068>] print_circular_bug+0x1f8/0x2e0
[  248.257787] [<ffff00000813c090>] __lock_acquire+0x1768/0x1778
[  248.263548] [<ffff00000813c90c>] lock_acquire+0xc4/0x288
[  248.268882] [<ffff000008bdb28c>] __mutex_lock+0x84/0x868
[  248.274219] [<ffff000008bdbaac>] mutex_lock_nested+0x3c/0x50
[  248.279889] [<ffff00000863e904>] xenbus_dev_queue_reply+0x3c/0x230
[  248.286081] [<ffff00000863aad8>] xenbus_thread+0x788/0x798
[  248.291585] [<ffff000008108070>] kthread+0x110/0x140
[  248.296572] [<ffff000008083710>] ret_from_fork+0x10/0x40

Apparently it's not easily reproducible, but Julien confirmed that the
dead lock condition as reported above is indeed in the Linux code.

Does anyone has an idea of how to fix this?

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Xen Linux deadlock
  2017-06-07 15:05 Xen Linux deadlock Andre Przywara
@ 2017-06-07 15:51 ` Juergen Gross
  0 siblings, 0 replies; 2+ messages in thread
From: Juergen Gross @ 2017-06-07 15:51 UTC (permalink / raw)
  To: Andre Przywara, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Boris Ostrovsky

On 07/06/17 17:05, Andre Przywara wrote:
> Hi,
> 
> when booting Linux 4.12-rc4 as Dom0 under a recent Xen HV I saw the
> following lockdep splat after running xencommons start:
> 
> root@junor1:~# bash /etc/init.d/xencommons start
> Setting domain 0 name, domid and JSON config...
> [  247.979498] ======================================================
> [  247.985688] WARNING: possible circular locking dependency detected
> [  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
> [  247.997040] ------------------------------------------------------
> [  248.003232] xenbus/91 is trying to acquire lock:
> [  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
> xenbus_dev_queue_reply+0x3c/0x230
> [  248.017163]
> [  248.017163] but task is already holding lock:
> [  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [  248.031267]
> [  248.031267] which lock already depends on the new lock.
> [  248.031267]
> [  248.039615]
> [  248.039615] the existing dependency chain (in reverse order) is:
> [  248.047176]
> [  248.047176] -> #1 (xb_write_mutex){+.+...}:
> [  248.052943]        __lock_acquire+0x1728/0x1778
> [  248.057498]        lock_acquire+0xc4/0x288
> [  248.061630]        __mutex_lock+0x84/0x868
> [  248.065755]        mutex_lock_nested+0x3c/0x50
> [  248.070227]        xs_send+0x164/0x1f8
> [  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
> [  248.079427]        xenbus_file_write+0x260/0x420
> [  248.084073]        __vfs_write+0x48/0x138
> [  248.088113]        vfs_write+0xa8/0x1b8
> [  248.091983]        SyS_write+0x54/0xb0
> [  248.095768]        el0_svc_naked+0x24/0x28
> [  248.099897]
> [  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
> [  248.106088]        print_circular_bug+0x80/0x2e0
> [  248.110730]        __lock_acquire+0x1768/0x1778
> [  248.115288]        lock_acquire+0xc4/0x288
> [  248.119417]        __mutex_lock+0x84/0x868
> [  248.123545]        mutex_lock_nested+0x3c/0x50
> [  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
> [  248.133005]        xenbus_thread+0x788/0x798
> [  248.137306]        kthread+0x110/0x140
> [  248.141087]        ret_from_fork+0x10/0x40
> [  248.145214]
> [  248.145214] other info that might help us debug this:
> [  248.145214]
> [  248.153383]  Possible unsafe locking scenario:
> [  248.153383]
> [  248.159403]        CPU0                    CPU1
> [  248.163960]        ----                    ----
> [  248.168518]   lock(xb_write_mutex);
> [  248.172045]                                lock(&u->msgbuffer_mutex);
> [  248.178500]                                lock(xb_write_mutex);
> [  248.184514]   lock(&u->msgbuffer_mutex);
> [  248.188470]
> [  248.188470]  *** DEADLOCK ***
> [  248.188470]
> [  248.194578] 2 locks held by xenbus/91:
> [  248.198360]  #0:  (xs_response_mutex){+.+...}, at:
> [<ffff00000863a7b0>] xenbus_thread+0x460/0x798
> [  248.207218]  #1:  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [  248.215818]
> [  248.215818] stack backtrace:
> [  248.220293] CPU: 0 PID: 91 Comm: xenbus Not tainted
> 4.12.0-rc4-00022-gc4b25c0 #575
> [  248.227858] Hardware name: ARM Juno development board (r1) (DT)
> [  248.233792] Call trace:
> [  248.236289] [<ffff00000808a748>] dump_backtrace+0x0/0x270
> [  248.241707] [<ffff00000808aa94>] show_stack+0x24/0x30
> [  248.246782] [<ffff0000084caa98>] dump_stack+0xb8/0xf0
> [  248.251859] [<ffff000008139068>] print_circular_bug+0x1f8/0x2e0
> [  248.257787] [<ffff00000813c090>] __lock_acquire+0x1768/0x1778
> [  248.263548] [<ffff00000813c90c>] lock_acquire+0xc4/0x288
> [  248.268882] [<ffff000008bdb28c>] __mutex_lock+0x84/0x868
> [  248.274219] [<ffff000008bdbaac>] mutex_lock_nested+0x3c/0x50
> [  248.279889] [<ffff00000863e904>] xenbus_dev_queue_reply+0x3c/0x230
> [  248.286081] [<ffff00000863aad8>] xenbus_thread+0x788/0x798
> [  248.291585] [<ffff000008108070>] kthread+0x110/0x140
> [  248.296572] [<ffff000008083710>] ret_from_fork+0x10/0x40
> 
> Apparently it's not easily reproducible, but Julien confirmed that the
> dead lock condition as reported above is indeed in the Linux code.
> 
> Does anyone has an idea of how to fix this?

Shouldn't be too hard. The xb_write_mutex can be dropped earlier in the
critical path. I'll send a patch.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-07 15:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 15:05 Xen Linux deadlock Andre Przywara
2017-06-07 15:51 ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).