* balloon_mutex lockdep complaint at HVM domain destroy
@ 2016-05-25 14:30 Ed Swierk
2016-05-25 16:58 ` David Vrabel
0 siblings, 1 reply; 3+ messages in thread
From: Ed Swierk @ 2016-05-25 14:30 UTC (permalink / raw)
To: xen-devel; +Cc: eswierk
The following lockdep dump occurs whenever I destroy an HVM domain, on
Linux 4.4 Dom0 with CONFIG_XEN_BALLOON=n on recent stable Xen 4.5.
Any clues whether this is a real potential deadlock, or how to silence
it if not?
======================================================
[ INFO: RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order detected ]
4.4.11-grsec #1 Not tainted
------------------------------------------------------
qemu-system-i38/3338 [HC0[0]:SC0[0]:HE1:SE1] is trying to acquire:
(balloon_mutex){+.+.+.}, at: [<free_xenballooned_pages at balloon.c:690>] ffffffff81430ac3
and this task is already holding:
(&priv->lock){+.+.-.}, at: [<gntdev_release at list.h:204>] ffffffff8143c77f
which would create a new lock dependency:
(&priv->lock){+.+.-.} -> (balloon_mutex){+.+.+.}
but this new dependency connects a RECLAIM_FS-irq-safe lock:
(&priv->lock){+.+.-.}
... which became RECLAIM_FS-irq-safe at:
[<__lock_acquire at lockdep.c:2839>] ffffffff810becc5
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
[<mn_invl_page at gntdev.c:490>] ffffffff8143c450
[<__mmu_notifier_invalidate_page at mmu_notifier.c:183>] ffffffff8119de42
[<try_to_unmap_one at mmu_notifier.h:275>] ffffffff811840c2
[<rmap_walk at rmap.c:1689>] ffffffff81185051
[<try_to_unmap at rmap.c:1534>] ffffffff81185497
[<shrink_page_list at vmscan.c:1063>] ffffffff811599b7
[<shrink_inactive_list at spinlock.h:339>] ffffffff8115a489
[<shrink_lruvec at vmscan.c:1942>] ffffffff8115af3a
[<shrink_zone at vmscan.c:2411>] ffffffff8115b1bb
[<kswapd at vmscan.c:3116>] ffffffff8115c1e4
[<kthread at kthread.c:209>] ffffffff8108eccc
[<ret_from_fork at entry_64.S:890>] ffffffff816d706e
to a RECLAIM_FS-irq-unsafe lock:
(balloon_mutex){+.+.+.}
... which became RECLAIM_FS-irq-unsafe at:
... [<mark_held_locks at lockdep.c:2541>] ffffffff810bdd69
[<lockdep_trace_alloc at current.h:14>] ffffffff810c12f9
[<__alloc_pages_nodemask at page_alloc.c:3248>] ffffffff8114e0d1
[<alloc_pages_current at mempolicy.c:2092>] ffffffff81199b36
[<decrease_reservation at balloon.c:501>] ffffffff8143030e
[<alloc_xenballooned_pages at balloon.c:629>] ffffffff81430c94
[<gnttab_alloc_pages at grant-table.c:691>] ffffffff8142f362
[<gntdev_ioctl at gntdev.c:156>] ffffffff8143d208
[<do_vfs_ioctl at ioctl.c:44>] ffffffff811da630
[<SyS_ioctl at ioctl.c:622>] ffffffff811daa64
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(balloon_mutex);
local_irq_disable();
lock(&priv->lock);
lock(balloon_mutex);
<Interrupt>
lock(&priv->lock);
*** DEADLOCK ***
1 lock held by qemu-system-i38/3338:
#0: (&priv->lock){+.+.-.}, at: [<gntdev_release at list.h:204>] ffffffff8143c77f
the dependencies between RECLAIM_FS-irq-safe lock and the holding lock:
-> (&priv->lock){+.+.-.} ops: 8996 {
HARDIRQ-ON-W at:
[<__lock_acquire at lockdep.c:2818>] ffffffff810bec71
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
[<__mmu_notifier_invalidate_range_start at mmu_notifier.c:197>] ffffffff8119d72a
[<wp_page_copy.isra.73 at mmu_notifier.h:282>] ffffffff81172051
[<do_wp_page at memory.c:2573>] ffffffff81173e55
[<handle_mm_fault at memory.c:3584>] ffffffff81175f26
[<__do_page_fault at fault.c:1491>] ffffffff8105676f
[<do_page_fault at fault.c:1553>] ffffffff81056a49
[<page_fault at entry_64.S:1440>] ffffffff816d87e8
[<SyS_read at read_write.c:570>] ffffffff811c4074
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
SOFTIRQ-ON-W at:
[<__lock_acquire at lockdep.c:2822>] ffffffff810bec9e
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
[<__mmu_notifier_invalidate_range_start at mmu_notifier.c:197>] ffffffff8119d72a
[<wp_page_copy.isra.73 at mmu_notifier.h:282>] ffffffff81172051
[<do_wp_page at memory.c:2573>] ffffffff81173e55
[<handle_mm_fault at memory.c:3584>] ffffffff81175f26
[<__do_page_fault at fault.c:1491>] ffffffff8105676f
[<do_page_fault at fault.c:1553>] ffffffff81056a49
[<page_fault at entry_64.S:1440>] ffffffff816d87e8
[<SyS_read at read_write.c:570>] ffffffff811c4074
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
IN-RECLAIM_FS-W at:
[<__lock_acquire at lockdep.c:2839>] ffffffff810becc5
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
[<mn_invl_page at gntdev.c:490>] ffffffff8143c450
[<__mmu_notifier_invalidate_page at mmu_notifier.c:183>] ffffffff8119de42
[<try_to_unmap_one at mmu_notifier.h:275>] ffffffff811840c2
[<rmap_walk at rmap.c:1689>] ffffffff81185051
[<try_to_unmap at rmap.c:1534>] ffffffff81185497
[<shrink_page_list at vmscan.c:1063>] ffffffff811599b7
[<shrink_inactive_list at spinlock.h:339>] ffffffff8115a489
[<shrink_lruvec at vmscan.c:1942>] ffffffff8115af3a
[<shrink_zone at vmscan.c:2411>] ffffffff8115b1bb
[<kswapd at vmscan.c:3116>] ffffffff8115c1e4
[<kthread at kthread.c:209>] ffffffff8108eccc
[<ret_from_fork at entry_64.S:890>] ffffffff816d706e
INITIAL USE at:
[<__lock_acquire at lockdep.c:3171>] ffffffff810be85c
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
[<__mmu_notifier_invalidate_range_start at mmu_notifier.c:197>] ffffffff8119d72a
[<wp_page_copy.isra.73 at mmu_notifier.h:282>] ffffffff81172051
[<do_wp_page at memory.c:2573>] ffffffff81173e55
[<handle_mm_fault at memory.c:3584>] ffffffff81175f26
[<__do_page_fault at fault.c:1491>] ffffffff8105676f
[<do_page_fault at fault.c:1553>] ffffffff81056a49
[<page_fault at entry_64.S:1440>] ffffffff816d87e8
[<SyS_read at read_write.c:570>] ffffffff811c4074
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
}
... key at: [<__bss_start at ??:?>] ffffffff82f41a30
... acquired at:
[<check_irq_usage at lockdep.c:1654>] ffffffff810bd2fb
[<__lock_acquire at lockdep_states.h:9>] ffffffff810bfa97
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<free_xenballooned_pages at balloon.c:690>] ffffffff81430ac3
[<gnttab_free_pages at grant-table.c:730>] ffffffff8142f3c1
[<gntdev_free_map at gntdev.c:123>] ffffffff8143c541
[<gntdev_put_map at gntdev.c:233>] ffffffff8143c6fa
[<gntdev_release at list.h:204>] ffffffff8143c79a
[<__fput at file_table.c:209>] ffffffff811c5240
[<____fput at file_table.c:245>] ffffffff811c5399
[<task_work_run at task_work.c:117 (discriminator 1)>] ffffffff8108d17e
[<do_exit at exit.c:758>] ffffffff8106bf60
[<do_group_exit at sched.h:2852>] ffffffff8106c84b
[<sys_exit_group at exit.c:898>] ffffffff8106c8cf
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
the dependencies between the lock to be acquired and RECLAIM_FS-irq-unsafe lock:
-> (balloon_mutex){+.+.+.} ops: 1628 {
HARDIRQ-ON-W at:
[<__lock_acquire at lockdep.c:2818>] ffffffff810bec71
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<alloc_xenballooned_pages at balloon.c:649>] ffffffff81430bdb
[<gnttab_alloc_pages at grant-table.c:691>] ffffffff8142f362
[<gntdev_ioctl at gntdev.c:156>] ffffffff8143d208
[<do_vfs_ioctl at ioctl.c:44>] ffffffff811da630
[<SyS_ioctl at ioctl.c:622>] ffffffff811daa64
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
SOFTIRQ-ON-W at:
[<__lock_acquire at lockdep.c:2822>] ffffffff810bec9e
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<alloc_xenballooned_pages at balloon.c:649>] ffffffff81430bdb
[<gnttab_alloc_pages at grant-table.c:691>] ffffffff8142f362
[<gntdev_ioctl at gntdev.c:156>] ffffffff8143d208
[<do_vfs_ioctl at ioctl.c:44>] ffffffff811da630
[<SyS_ioctl at ioctl.c:622>] ffffffff811daa64
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
RECLAIM_FS-ON-W at:
[<mark_held_locks at lockdep.c:2541>] ffffffff810bdd69
[<lockdep_trace_alloc at current.h:14>] ffffffff810c12f9
[<__alloc_pages_nodemask at page_alloc.c:3248>] ffffffff8114e0d1
[<alloc_pages_current at mempolicy.c:2092>] ffffffff81199b36
[<decrease_reservation at balloon.c:501>] ffffffff8143030e
[<alloc_xenballooned_pages at balloon.c:629>] ffffffff81430c94
[<gnttab_alloc_pages at grant-table.c:691>] ffffffff8142f362
[<gntdev_ioctl at gntdev.c:156>] ffffffff8143d208
[<do_vfs_ioctl at ioctl.c:44>] ffffffff811da630
[<SyS_ioctl at ioctl.c:622>] ffffffff811daa64
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
INITIAL USE at:
[<__lock_acquire at lockdep.c:3171>] ffffffff810be85c
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<alloc_xenballooned_pages at balloon.c:649>] ffffffff81430bdb
[<gnttab_alloc_pages at grant-table.c:691>] ffffffff8142f362
[<gntdev_ioctl at gntdev.c:156>] ffffffff8143d208
[<do_vfs_ioctl at ioctl.c:44>] ffffffff811da630
[<SyS_ioctl at ioctl.c:622>] ffffffff811daa64
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
}
... key at: [<efi_scratch at ??:?>] ffffffff81dfb430
... acquired at:
[<check_irq_usage at lockdep.c:1654>] ffffffff810bd2fb
[<__lock_acquire at lockdep_states.h:9>] ffffffff810bfa97
[<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
[<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
[<free_xenballooned_pages at balloon.c:690>] ffffffff81430ac3
[<gnttab_free_pages at grant-table.c:730>] ffffffff8142f3c1
[<gntdev_free_map at gntdev.c:123>] ffffffff8143c541
[<gntdev_put_map at gntdev.c:233>] ffffffff8143c6fa
[<gntdev_release at list.h:204>] ffffffff8143c79a
[<__fput at file_table.c:209>] ffffffff811c5240
[<____fput at file_table.c:245>] ffffffff811c5399
[<task_work_run at task_work.c:117 (discriminator 1)>] ffffffff8108d17e
[<do_exit at exit.c:758>] ffffffff8106bf60
[<do_group_exit at sched.h:2852>] ffffffff8106c84b
[<sys_exit_group at exit.c:898>] ffffffff8106c8cf
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] ffffffff816d6cba
stack backtrace:
CPU: 0 PID: 3338 Comm: qemu-system-i38 Not tainted 4.4.11-grsec #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
0000000000000000 ffffc90007be3a98 ffffffff8137d3f8 0000000000000011
ffffffff82918db0 00000000000000f0 ffffc90007be3ba0 ffffffff810bd253
0000000000000000 0000000000000001 ffff880000000001 0000000007be3ad0
Call Trace:
[<dump_stack at dump_stack.c:53>] dump_stack+0x9a/0xe2
[<check_usage at lockdep.c:1566>] check_usage+0x523/0x580
[<__lock_acquire at lockdep.c:3171>] ? __lock_acquire+0x4cc/0x1d80
[<debug_check_no_locks_freed at lockdep.c:4136>] ? debug_check_no_locks_freed+0xe5/0x1a0
[<trace_hardirqs_on_caller at lockdep.c:2570>] ? trace_hardirqs_on_caller+0x13d/0x1d0
[<check_irq_usage at lockdep.c:1654>] check_irq_usage+0x4b/0xb0
[<__lock_acquire at lockdep_states.h:9>] __lock_acquire+0x1707/0x1d80
[<__lock_acquire at lockdep.c:2035>] ? __lock_acquire+0x1186/0x1d80
[<free_xenballooned_pages at balloon.c:690>] ? free_xenballooned_pages+0x23/0x110
[<lock_acquire at paravirt.h:839>] lock_acquire+0x89/0xc0
[<free_xenballooned_pages at balloon.c:690>] ? free_xenballooned_pages+0x23/0x110
[<mutex_lock_nested at mutex.c:526>] mutex_lock_nested+0x6c/0x480
[<free_xenballooned_pages at balloon.c:690>] ? free_xenballooned_pages+0x23/0x110
[<mark_held_locks at lockdep.c:2541>] ? mark_held_locks+0x79/0xa0
[<mutex_lock_nested at paravirt.h:839>] ? mutex_lock_nested+0x3d3/0x480
[<gntdev_release at list.h:204>] ? gntdev_release+0x1f/0xa0
[<trace_hardirqs_on_caller at lockdep.c:2570>] ? trace_hardirqs_on_caller+0x13d/0x1d0
[<free_xenballooned_pages at balloon.c:690>] free_xenballooned_pages+0x23/0x110
[<gnttab_free_pages at grant-table.c:730>] gnttab_free_pages+0x31/0x40
[<gntdev_free_map at gntdev.c:123>] gntdev_free_map+0x21/0x70
[<gntdev_put_map at gntdev.c:233>] gntdev_put_map+0x8a/0xf0
[<gntdev_release at list.h:204>] gntdev_release+0x3a/0xa0
[<__fput at file_table.c:209>] __fput+0xf0/0x210
[<____fput at file_table.c:245>] ____fput+0x9/0x10
[<task_work_run at task_work.c:117 (discriminator 1)>] task_work_run+0x6e/0xa0
[<do_exit at exit.c:758>] do_exit+0x320/0xb80
[<trace_hardirqs_on_caller at lockdep.c:2570>] ? trace_hardirqs_on_caller+0x13d/0x1d0
[<do_group_exit at sched.h:2852>] do_group_exit+0x4b/0xc0
[<sys_exit_group at exit.c:898>] SyS_exit_group+0xf/0x10
[<entry_SYSCALL_64_fastpath at entry_64.S:596>] entry_SYSCALL_64_fastpath+0x16/0x7e
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: balloon_mutex lockdep complaint at HVM domain destroy
2016-05-25 14:30 balloon_mutex lockdep complaint at HVM domain destroy Ed Swierk
@ 2016-05-25 16:58 ` David Vrabel
2016-05-26 17:35 ` Ed Swierk
0 siblings, 1 reply; 3+ messages in thread
From: David Vrabel @ 2016-05-25 16:58 UTC (permalink / raw)
To: Ed Swierk, xen-devel
On 25/05/16 15:30, Ed Swierk wrote:
> The following lockdep dump occurs whenever I destroy an HVM domain, on
> Linux 4.4 Dom0 with CONFIG_XEN_BALLOON=n on recent stable Xen 4.5.
This occurs in dom0? Or the guest that's being destroyed?
> Any clues whether this is a real potential deadlock, or how to silence
> it if not?
It's a bug but...
> ======================================================
> [ INFO: RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order detected ]
> 4.4.11-grsec #1 Not tainted
^^^^^^^^^^^^
...this isn't a vanilla kernel? Can you try vanilla 4.6?
Because:
> IN-RECLAIM_FS-W at:
> [<__lock_acquire at lockdep.c:2839>] ffffffff810becc5
> [<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
> [<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
> [<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
> [<mn_invl_page at gntdev.c:490>] ffffffff8143c450
> [<__mmu_notifier_invalidate_page at mmu_notifier.c:183>] ffffffff8119de42
> [<try_to_unmap_one at mmu_notifier.h:275>] ffffffff811840c2
> [<rmap_walk at rmap.c:1689>] ffffffff81185051
> [<try_to_unmap at rmap.c:1534>] ffffffff81185497
> [<shrink_page_list at vmscan.c:1063>] ffffffff811599b7
> [<shrink_inactive_list at spinlock.h:339>] ffffffff8115a489
> [<shrink_lruvec at vmscan.c:1942>] ffffffff8115af3a
> [<shrink_zone at vmscan.c:2411>] ffffffff8115b1bb
> [<kswapd at vmscan.c:3116>] ffffffff8115c1e4
> [<kthread at kthread.c:209>] ffffffff8108eccc
> [<ret_from_fork at entry_64.S:890>] ffffffff816d706e
We should not be reclaiming pages from a gntdev VMA since it's special
(marked as VM_IO).
David
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: balloon_mutex lockdep complaint at HVM domain destroy
2016-05-25 16:58 ` David Vrabel
@ 2016-05-26 17:35 ` Ed Swierk
0 siblings, 0 replies; 3+ messages in thread
From: Ed Swierk @ 2016-05-26 17:35 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel
On Wed, May 25, 2016 at 9:58 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> This occurs in dom0? Or the guest that's being destroyed?
The lockdep warning comes from dom0 when the HVM guest is being destroyed.
> It's a bug but...
>
>> ======================================================
>> [ INFO: RECLAIM_FS-safe -> RECLAIM_FS-unsafe lock order detected ]
>> 4.4.11-grsec #1 Not tainted
> ^^^^^^^^^^^^
> ...this isn't a vanilla kernel? Can you try vanilla 4.6?
I tried vanilla 4.4.11, and get the same result. I'm having trouble
booting 4.6.0 at all--must be another regression in the early xen boot
code.
> Because:
>
>> IN-RECLAIM_FS-W at:
>> [<__lock_acquire at lockdep.c:2839>] ffffffff810becc5
>> [<lock_acquire at paravirt.h:839>] ffffffff810c0ac9
>> [<mutex_lock_nested at mutex.c:526>] ffffffff816d1b4c
>> [<mn_invl_range_start at gntdev.c:476>] ffffffff8143c3d4
>> [<mn_invl_page at gntdev.c:490>] ffffffff8143c450
>> [<__mmu_notifier_invalidate_page at mmu_notifier.c:183>] ffffffff8119de42
>> [<try_to_unmap_one at mmu_notifier.h:275>] ffffffff811840c2
>> [<rmap_walk at rmap.c:1689>] ffffffff81185051
>> [<try_to_unmap at rmap.c:1534>] ffffffff81185497
>> [<shrink_page_list at vmscan.c:1063>] ffffffff811599b7
>> [<shrink_inactive_list at spinlock.h:339>] ffffffff8115a489
>> [<shrink_lruvec at vmscan.c:1942>] ffffffff8115af3a
>> [<shrink_zone at vmscan.c:2411>] ffffffff8115b1bb
>> [<kswapd at vmscan.c:3116>] ffffffff8115c1e4
>> [<kthread at kthread.c:209>] ffffffff8108eccc
>> [<ret_from_fork at entry_64.S:890>] ffffffff816d706e
>
> We should not be reclaiming pages from a gntdev VMA since it's special
> (marked as VM_IO).
Can you suggest any printks for me to add that might help isolate the issue?
--Ed
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-05-26 17:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25 14:30 balloon_mutex lockdep complaint at HVM domain destroy Ed Swierk
2016-05-25 16:58 ` David Vrabel
2016-05-26 17:35 ` Ed Swierk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.