xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* gntdev/gntalloc and fork
@ 2015-04-30 14:47 Marek Marczykowski-Górecki
  2015-05-27 23:45 ` gntdev/gntalloc and fork? - crash in gntdev Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-04-30 14:47 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6965 bytes --]

Hi,

What is the proper way to handle shared pages (either side - using
gntdev or gntalloc) regarding fork and possible exec later? The child
process do not need to access those pages in any way, but will map
different one(s), using newly opened FD to the gntdev/gntalloc device.
Should it unmap them and close FD to the device manually just after the
fork? Or the process using gntdev or gntalloc should prevent using fork
at all?

I'm asking because I get kernel oops[1] in context of such process. This
process uses both gntdev and gntalloc. The PID reported there is a
child, which maps additional pages (using newly opened FD to
/dev/xen/gnt*), but I'm not sure if the crash happens before, after or
at this second mapping (actually vchan connection), or maybe even at
cleanup of this second mapping. The parent process keeps its mappings
for the whole lifetime of its child.  I don't have a 100% reliable way
to reproduce this problem, but it happens quite often when I run such
operations in a loop.

The kernel is vanilla 3.19.3, running on Xen 4.4.2.

The kernel message:
[74376.073464] general protection fault: 0000 [#1] SMP 
[74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
[74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
[74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
[74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
[74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
[74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
[74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
[74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
[74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
[74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
[74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
[74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
[74376.073582] Stack:
[74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
[74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
[74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
[74376.073603] Call Trace:
[74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
[74376.073620]  [<ffffffff811e88fb>] __mmu_notifier_invalidate_range_start+0x5b/0x90
[74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
[74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
[74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
[74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
[74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
[74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0 [xen_evtchn]
[74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
[74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
[74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
[74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
[74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c 8b 48 08
[74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
[74376.074031]  RSP <ffff88000032fc08>
[74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
[74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [qrexec-agent:9379]
[74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
[74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D    O   3.19.3-4.pvops.qubes.x86_64 #1
[74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti: ffff880002470000
[74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>] _raw_spin_lock+0x21/0x30
[74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
[74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX: 0000000000018cf8
[74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI: ffff880012702e60
[74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09: 00000001802a0019
[74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12: ffff880012702e70
[74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15: ffff880012b163c0
[74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000) knlGS:0000000000000000
[74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4: 0000000000042660
[74404.069009] Stack:
[74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48 ffff880002345c00
[74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78 ffffffff811e8c2e
[74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40 ffff880012d123c8
[74404.069009] Call Trace:
[74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130 [xen_gntdev]
[74404.069009]  [<ffffffff811e8c2e>] mmu_notifier_unregister+0x4e/0xe0
[74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0 [xen_gntdev]
[74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
[74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
[74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
[74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
[74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
[74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: gntdev/gntalloc and fork? - crash in gntdev
  2015-04-30 14:47 gntdev/gntalloc and fork Marek Marczykowski-Górecki
@ 2015-05-27 23:45 ` Marek Marczykowski-Górecki
  2015-06-17 19:42   ` race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev) Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-05-27 23:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel


[-- Attachment #1.1: Type: text/plain, Size: 7233 bytes --]

On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> What is the proper way to handle shared pages (either side - using
> gntdev or gntalloc) regarding fork and possible exec later? The child
> process do not need to access those pages in any way, but will map
> different one(s), using newly opened FD to the gntdev/gntalloc device.
> Should it unmap them and close FD to the device manually just after the
> fork? Or the process using gntdev or gntalloc should prevent using fork
> at all?
> 
> I'm asking because I get kernel oops[1] in context of such process. This
> process uses both gntdev and gntalloc. The PID reported there is a
> child, which maps additional pages (using newly opened FD to
> /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
> at this second mapping (actually vchan connection), or maybe even at
> cleanup of this second mapping. The parent process keeps its mappings
> for the whole lifetime of its child. I don't have a 100% reliable way
> to reproduce this problem, but it happens quite often when I run such
> operations in a loop.

Any ideas? 

> The kernel is vanilla 3.19.3, running on Xen 4.4.2.
> 
> The kernel message:
> [74376.073464] general protection fault: 0000 [#1] SMP 
> [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
> [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
> [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
> [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
> [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
> [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
> [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
> [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
> [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
> [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
> [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
> [74376.073582] Stack:
> [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
> [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
> [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
> [74376.073603] Call Trace:
> [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
> [74376.073620]  [<ffffffff811e88fb>] __mmu_notifier_invalidate_range_start+0x5b/0x90
> [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
> [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
> [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
> [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
> [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
> [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0 [xen_evtchn]
> [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
> [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
> [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
> [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
> [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c 8b 48 08
> [74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
> [74376.074031]  RSP <ffff88000032fc08>
> [74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
> [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [qrexec-agent:9379]
> [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D    O   3.19.3-4.pvops.qubes.x86_64 #1
> [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti: ffff880002470000
> [74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>] _raw_spin_lock+0x21/0x30
> [74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
> [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX: 0000000000018cf8
> [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI: ffff880012702e60
> [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09: 00000001802a0019
> [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12: ffff880012702e70
> [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15: ffff880012b163c0
> [74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000) knlGS:0000000000000000
> [74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4: 0000000000042660
> [74404.069009] Stack:
> [74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48 ffff880002345c00
> [74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78 ffffffff811e8c2e
> [74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40 ffff880012d123c8
> [74404.069009] Call Trace:
> [74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130 [xen_gntdev]
> [74404.069009]  [<ffffffff811e8c2e>] mmu_notifier_unregister+0x4e/0xe0
> [74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0 [xen_gntdev]
> [74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
> [74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
> [74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
> [74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
> [74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
> [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev)
  2015-05-27 23:45 ` gntdev/gntalloc and fork? - crash in gntdev Marek Marczykowski-Górecki
@ 2015-06-17 19:42   ` Marek Marczykowski-Górecki
  2015-06-22 17:46     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-06-17 19:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel


[-- Attachment #1.1: Type: text/plain, Size: 9827 bytes --]

On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > What is the proper way to handle shared pages (either side - using
> > gntdev or gntalloc) regarding fork and possible exec later? The child
> > process do not need to access those pages in any way, but will map
> > different one(s), using newly opened FD to the gntdev/gntalloc device.
> > Should it unmap them and close FD to the device manually just after the
> > fork? Or the process using gntdev or gntalloc should prevent using fork
> > at all?
> > 
> > I'm asking because I get kernel oops[1] in context of such process. This
> > process uses both gntdev and gntalloc. The PID reported there is a
> > child, which maps additional pages (using newly opened FD to
> > /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
> > at this second mapping (actually vchan connection), or maybe even at
> > cleanup of this second mapping. The parent process keeps its mappings
> > for the whole lifetime of its child. I don't have a 100% reliable way
> > to reproduce this problem, but it happens quite often when I run such
> > operations in a loop.
> 
> Any ideas?

I've done some further debugging, below is what I get.

> > The kernel is vanilla 3.19.3, running on Xen 4.4.2.
> > 
> > The kernel message:
> > [74376.073464] general protection fault: 0000 [#1] SMP 
> > [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> > [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
> > [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
> > [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]

	static void unmap_if_in_range(struct grant_map *map,
			  unsigned long start, unsigned long end)
	{   
	    unsigned long mstart, mend;
	    int err;

	    if (!map->vma)
		return;

The above crash is at first access to "map"...

> > [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
> > [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
> > [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100

... which is 0xdead000000100100 (LIST_POISON1).


> > [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
> > [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
> > [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
> > [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
> > [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
> > [74376.073582] Stack:
> > [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
> > [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
> > [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
> > [74376.073603] Call Trace:
> > [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
> > [74376.073620]  [<ffffffff811e88fb>]
> > __mmu_notifier_invalidate_range_start+0x5b/0x90
> > [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
> > [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
> > [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
> > [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
> > [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
> > [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0
> > [xen_evtchn]
> > [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
> > [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
> > [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
> > [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30


	static void mn_invl_range_start(struct mmu_notifier *mn,
			struct mm_struct *mm,
			unsigned long start, unsigned long end)
	{
	    struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
	    struct grant_map *map;

	    spin_lock(&priv->lock);
	    list_for_each_entry(map, &priv->maps, next) {
		unmap_if_in_range(map, start, end);
	    }

mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so
something is wrong with priv->maps list.

So I've searched for all the list_del calls on this list and found one not
guarded by spinlock:

	static int gntdev_release(struct inode *inode, struct file *flip)
	{   
	    struct gntdev_priv *priv = flip->private_data;
	    struct grant_map *map;

	    pr_debug("priv %p\n", priv);

	    while (!list_empty(&priv->maps)) {
		map = list_entry(priv->maps.next, struct grant_map, next);
		list_del(&map->next);
		gntdev_put_map(NULL /* already removed */, map);
	    }
	    WARN_ON(!list_empty(&priv->freeable_maps));

	    if (use_ptemod)
		mmu_notifier_unregister(&priv->mn, priv->mm);
	    kfree(priv);
	    return 0;
	}   

So I see this as:
P1(parent)                 P2(child)
                           1. gntdev_release called
                           2. list destroyed (above loop)

3. page fault occurs, gntdev mmu notifier called
4. priv->lock taken
5. iterate over priv->maps
6. crashed since map is already destroyed

                           7. mmu_notifier_unregister calls:
                           8.   mn_release, which tries to take priv->lock
                           9. this process hangs

So I'd guess the fix would be to move mmu_notifier_unregister before
releasing priv->maps list. Could someone more familiar with this code
confirm this?

> > [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f
> > 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb
> > 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c
> > 8b 48 08
> > [74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0
> > [xen_gntdev]
> > [74376.074031]  RSP <ffff88000032fc08>
> > [74376.091682] ---[ end trace 2b21c5b714eb1071 ]---

And the other process:

> > [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
> > [qrexec-agent:9379]
> > [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE
> > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> > nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables
> > intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul
> > crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel
> > nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd
> > udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O)
> > xen_evtchn xen_blkfront
> > [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D
> > O   3.19.3-4.pvops.qubes.x86_64 #1
> > [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti:
> > ffff880002470000
> > [74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>]
> > _raw_spin_lock+0x21/0x30
> > [74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
> > [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX:
> > 0000000000018cf8
> > [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI:
> > ffff880012702e60
> > [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09:
> > 00000001802a0019
> > [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12:
> > ffff880012702e70
> > [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15:
> > ffff880012b163c0
> > [74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000)
> > knlGS:0000000000000000
> > [74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4:
> > 0000000000042660
> > [74404.069009] Stack:
> > [74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48
> > ffff880002345c00
> > [74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78
> > ffffffff811e8c2e
> > [74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40
> > ffff880012d123c8
> > [74404.069009] Call Trace:
> > [74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130
> > [xen_gntdev]
> > [74404.069009]  [<ffffffff811e8c2e>]
> > mmu_notifier_unregister+0x4e/0xe0
> > [74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0
> > [xen_gntdev]
> > [74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
> > [74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
> > [74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
> > [74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
> > [74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
> > [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3
> > 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66
> > 66 90 55

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev)
  2015-06-17 19:42   ` race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev) Marek Marczykowski-Górecki
@ 2015-06-22 17:46     ` Konrad Rzeszutek Wilk
  2015-06-22 18:13       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-22 17:46 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, dgdegra
  Cc: Boris Ostrovsky, David Vrabel, xen-devel

On Wed, Jun 17, 2015 at 09:42:11PM +0200, Marek Marczykowski-Górecki wrote:
> On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:
> > On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
> > > Hi,
> > > 
> > > What is the proper way to handle shared pages (either side - using
> > > gntdev or gntalloc) regarding fork and possible exec later? The child
> > > process do not need to access those pages in any way, but will map
> > > different one(s), using newly opened FD to the gntdev/gntalloc device.
> > > Should it unmap them and close FD to the device manually just after the
> > > fork? Or the process using gntdev or gntalloc should prevent using fork
> > > at all?
> > > 
> > > I'm asking because I get kernel oops[1] in context of such process. This
> > > process uses both gntdev and gntalloc. The PID reported there is a
> > > child, which maps additional pages (using newly opened FD to
> > > /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
> > > at this second mapping (actually vchan connection), or maybe even at
> > > cleanup of this second mapping. The parent process keeps its mappings
> > > for the whole lifetime of its child. I don't have a 100% reliable way
> > > to reproduce this problem, but it happens quite often when I run such
> > > operations in a loop.
> > 
> > Any ideas?
> 
> I've done some further debugging, below is what I get.

Woot! Thank you!
> 
> > > The kernel is vanilla 3.19.3, running on Xen 4.4.2.
> > > 
> > > The kernel message:
> > > [74376.073464] general protection fault: 0000 [#1] SMP 
> > > [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> > > [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
> > > [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
> > > [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
> 
> 	static void unmap_if_in_range(struct grant_map *map,
> 			  unsigned long start, unsigned long end)
> 	{   
> 	    unsigned long mstart, mend;
> 	    int err;
> 
> 	    if (!map->vma)
> 		return;
> 
> The above crash is at first access to "map"...
> 
> > > [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
> > > [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
> > > [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
> 
> ... which is 0xdead000000100100 (LIST_POISON1).
> 
> 
> > > [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
> > > [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
> > > [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
> > > [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
> > > [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
> > > [74376.073582] Stack:
> > > [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
> > > [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
> > > [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
> > > [74376.073603] Call Trace:
> > > [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
> > > [74376.073620]  [<ffffffff811e88fb>]
> > > __mmu_notifier_invalidate_range_start+0x5b/0x90
> > > [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
> > > [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
> > > [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
> > > [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
> > > [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
> > > [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0
> > > [xen_evtchn]
> > > [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
> > > [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
> > > [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
> > > [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
> 
> 
> 	static void mn_invl_range_start(struct mmu_notifier *mn,
> 			struct mm_struct *mm,
> 			unsigned long start, unsigned long end)
> 	{
> 	    struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> 	    struct grant_map *map;
> 
> 	    spin_lock(&priv->lock);
> 	    list_for_each_entry(map, &priv->maps, next) {
> 		unmap_if_in_range(map, start, end);
> 	    }
> 
> mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so
> something is wrong with priv->maps list.
> 
> So I've searched for all the list_del calls on this list and found one not
> guarded by spinlock:
> 
> 	static int gntdev_release(struct inode *inode, struct file *flip)
> 	{   
> 	    struct gntdev_priv *priv = flip->private_data;
> 	    struct grant_map *map;
> 
> 	    pr_debug("priv %p\n", priv);
> 
> 	    while (!list_empty(&priv->maps)) {
> 		map = list_entry(priv->maps.next, struct grant_map, next);
> 		list_del(&map->next);
> 		gntdev_put_map(NULL /* already removed */, map);
> 	    }
> 	    WARN_ON(!list_empty(&priv->freeable_maps));
> 
> 	    if (use_ptemod)
> 		mmu_notifier_unregister(&priv->mn, priv->mm);
> 	    kfree(priv);
> 	    return 0;
> 	}   
> 
> So I see this as:
> P1(parent)                 P2(child)
>                            1. gntdev_release called
>                            2. list destroyed (above loop)
> 
> 3. page fault occurs, gntdev mmu notifier called
> 4. priv->lock taken
> 5. iterate over priv->maps


Could we check the map->users?

> 6. crashed since map is already destroyed

Oh, indeed. gntdev_release calls gntdev_put_map which frees the whole
thing! Ouch.
> 
>                            7. mmu_notifier_unregister calls:
>                            8.   mn_release, which tries to take priv->lock
>                            9. this process hangs
> 
> So I'd guess the fix would be to move mmu_notifier_unregister before
> releasing priv->maps list. Could someone more familiar with this code
> confirm this?

Hey Marek,

Could we just add an lock around the 'gntdev_release' loop?
Daniel added this code to guard against a similar race:

commit 16a1d0225e22e4e273e6b60a21db95decde666c2
Author: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Date:   Wed Jan 2 22:57:12 2013 +0000

    xen/gntdev: correctly unmap unlinked maps in mmu notifier
    
    If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping
    it, the entry is removed from priv->maps and the later call to
    mn_invl_range_start won't find it to do the unmapping. Fix this by
    creating another list of freeable maps that the mmu notifier can search
    and use to unmap grants.
    
    Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

Lets see what he thinks.
> 
> > > [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f
> > > 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb
> > > 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c
> > > 8b 48 08
> > > [74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0
> > > [xen_gntdev]
> > > [74376.074031]  RSP <ffff88000032fc08>
> > > [74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
> 
> And the other process:
> 
> > > [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
> > > [qrexec-agent:9379]
> > > [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE
> > > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> > > nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables
> > > intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul
> > > crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel
> > > nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd
> > > udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O)
> > > xen_evtchn xen_blkfront
> > > [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D
> > > O   3.19.3-4.pvops.qubes.x86_64 #1
> > > [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti:
> > > ffff880002470000
> > > [74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>]
> > > _raw_spin_lock+0x21/0x30
> > > [74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
> > > [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX:
> > > 0000000000018cf8
> > > [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI:
> > > ffff880012702e60
> > > [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09:
> > > 00000001802a0019
> > > [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12:
> > > ffff880012702e70
> > > [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15:
> > > ffff880012b163c0
> > > [74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000)
> > > knlGS:0000000000000000
> > > [74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4:
> > > 0000000000042660
> > > [74404.069009] Stack:
> > > [74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48
> > > ffff880002345c00
> > > [74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78
> > > ffffffff811e8c2e
> > > [74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40
> > > ffff880012d123c8
> > > [74404.069009] Call Trace:
> > > [74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130
> > > [xen_gntdev]
> > > [74404.069009]  [<ffffffff811e8c2e>]
> > > mmu_notifier_unregister+0x4e/0xe0
> > > [74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0
> > > [xen_gntdev]
> > > [74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
> > > [74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
> > > [74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
> > > [74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
> > > [74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
> > > [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > > 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3
> > > 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66
> > > 66 90 55
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev)
  2015-06-22 17:46     ` Konrad Rzeszutek Wilk
@ 2015-06-22 18:13       ` Marek Marczykowski-Górecki
  2015-06-22 18:37         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-06-22 18:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Boris Ostrovsky, dgdegra, David Vrabel, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 12579 bytes --]

On Mon, Jun 22, 2015 at 01:46:27PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 17, 2015 at 09:42:11PM +0200, Marek Marczykowski-Górecki wrote:
> > On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:
> > > On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > What is the proper way to handle shared pages (either side - using
> > > > gntdev or gntalloc) regarding fork and possible exec later? The child
> > > > process do not need to access those pages in any way, but will map
> > > > different one(s), using newly opened FD to the gntdev/gntalloc device.
> > > > Should it unmap them and close FD to the device manually just after the
> > > > fork? Or the process using gntdev or gntalloc should prevent using fork
> > > > at all?
> > > > 
> > > > I'm asking because I get kernel oops[1] in context of such process. This
> > > > process uses both gntdev and gntalloc. The PID reported there is a
> > > > child, which maps additional pages (using newly opened FD to
> > > > /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
> > > > at this second mapping (actually vchan connection), or maybe even at
> > > > cleanup of this second mapping. The parent process keeps its mappings
> > > > for the whole lifetime of its child. I don't have a 100% reliable way
> > > > to reproduce this problem, but it happens quite often when I run such
> > > > operations in a loop.
> > > 
> > > Any ideas?
> > 
> > I've done some further debugging, below is what I get.
> 
> Woot! Thank you!
> > 
> > > > The kernel is vanilla 3.19.3, running on Xen 4.4.2.
> > > > 
> > > > The kernel message:
> > > > [74376.073464] general protection fault: 0000 [#1] SMP 
> > > > [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> > > > [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
> > > > [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
> > > > [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
> > 
> > 	static void unmap_if_in_range(struct grant_map *map,
> > 			  unsigned long start, unsigned long end)
> > 	{   
> > 	    unsigned long mstart, mend;
> > 	    int err;
> > 
> > 	    if (!map->vma)
> > 		return;
> > 
> > The above crash is at first access to "map"...
> > 
> > > > [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
> > > > [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
> > > > [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
> > 
> > ... which is 0xdead000000100100 (LIST_POISON1).
> > 
> > 
> > > > [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
> > > > [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
> > > > [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
> > > > [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
> > > > [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
> > > > [74376.073582] Stack:
> > > > [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
> > > > [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
> > > > [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
> > > > [74376.073603] Call Trace:
> > > > [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
> > > > [74376.073620]  [<ffffffff811e88fb>]
> > > > __mmu_notifier_invalidate_range_start+0x5b/0x90
> > > > [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
> > > > [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
> > > > [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
> > > > [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
> > > > [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
> > > > [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0
> > > > [xen_evtchn]
> > > > [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
> > > > [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
> > > > [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
> > > > [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
> > 
> > 
> > 	static void mn_invl_range_start(struct mmu_notifier *mn,
> > 			struct mm_struct *mm,
> > 			unsigned long start, unsigned long end)
> > 	{
> > 	    struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> > 	    struct grant_map *map;
> > 
> > 	    spin_lock(&priv->lock);
> > 	    list_for_each_entry(map, &priv->maps, next) {
> > 		unmap_if_in_range(map, start, end);
> > 	    }
> > 
> > mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so
> > something is wrong with priv->maps list.
> > 
> > So I've searched for all the list_del calls on this list and found one not
> > guarded by spinlock:
> > 
> > 	static int gntdev_release(struct inode *inode, struct file *flip)
> > 	{   
> > 	    struct gntdev_priv *priv = flip->private_data;
> > 	    struct grant_map *map;
> > 
> > 	    pr_debug("priv %p\n", priv);
> > 
> > 	    while (!list_empty(&priv->maps)) {
> > 		map = list_entry(priv->maps.next, struct grant_map, next);
> > 		list_del(&map->next);
> > 		gntdev_put_map(NULL /* already removed */, map);
> > 	    }
> > 	    WARN_ON(!list_empty(&priv->freeable_maps));
> > 
> > 	    if (use_ptemod)
> > 		mmu_notifier_unregister(&priv->mn, priv->mm);
> > 	    kfree(priv);
> > 	    return 0;
> > 	}   
> > 
> > So I see this as:
> > P1(parent)                 P2(child)
> >                            1. gntdev_release called
> >                            2. list destroyed (above loop)
> > 
> > 3. page fault occurs, gntdev mmu notifier called
> > 4. priv->lock taken
> > 5. iterate over priv->maps
> 
> 
> Could we check the map->users?

I don't have an easy way to reproduce the problem. It happens randomly
once a day/two in some random domain. I've tried to fiddle with
gnttab+fork+page faults, but failed to reliably reproduce the problem.

> > 6. crashed since map is already destroyed
> 
> Oh, indeed. gntdev_release calls gntdev_put_map which frees the whole
> thing! Ouch.
> > 
> >                            7. mmu_notifier_unregister calls:
> >                            8.   mn_release, which tries to take priv->lock
> >                            9. this process hangs
> > 
> > So I'd guess the fix would be to move mmu_notifier_unregister before
> > releasing priv->maps list. Could someone more familiar with this code
> > confirm this?
> 
> Hey Marek,
> 
> Could we just add an lock around the 'gntdev_release' loop?

I don't fully understand what is going there, especially when
gntdev_release is called (in case of forked process, still some memory
mapped etc), and when mmu notifier is called. But just looking at this
crash IMO adding a lock should be sufficient. What will happen when mmu notifier
would not find the mapping? I guess nothing, because the mapping is just
removed for a reason, right?

> Daniel added this code to guard against a similar race:
> 
> commit 16a1d0225e22e4e273e6b60a21db95decde666c2
> Author: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Date:   Wed Jan 2 22:57:12 2013 +0000
> 
>     xen/gntdev: correctly unmap unlinked maps in mmu notifier
>     
>     If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping
>     it, the entry is removed from priv->maps and the later call to
>     mn_invl_range_start won't find it to do the unmapping. Fix this by
>     creating another list of freeable maps that the mmu notifier can search
>     and use to unmap grants.
>     
>     Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> 
> Lets see what he thinks.
> > 
> > > > [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f
> > > > 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb
> > > > 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c
> > > > 8b 48 08
> > > > [74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0
> > > > [xen_gntdev]
> > > > [74376.074031]  RSP <ffff88000032fc08>
> > > > [74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
> > 
> > And the other process:
> > 
> > > > [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
> > > > [qrexec-agent:9379]
> > > > [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE
> > > > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> > > > nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables
> > > > intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul
> > > > crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel
> > > > nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd
> > > > udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O)
> > > > xen_evtchn xen_blkfront
> > > > [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D
> > > > O   3.19.3-4.pvops.qubes.x86_64 #1
> > > > [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti:
> > > > ffff880002470000
> > > > [74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>]
> > > > _raw_spin_lock+0x21/0x30
> > > > [74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
> > > > [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX:
> > > > 0000000000018cf8
> > > > [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI:
> > > > ffff880012702e60
> > > > [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09:
> > > > 00000001802a0019
> > > > [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12:
> > > > ffff880012702e70
> > > > [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15:
> > > > ffff880012b163c0
> > > > [74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000)
> > > > knlGS:0000000000000000
> > > > [74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4:
> > > > 0000000000042660
> > > > [74404.069009] Stack:
> > > > [74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48
> > > > ffff880002345c00
> > > > [74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78
> > > > ffffffff811e8c2e
> > > > [74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40
> > > > ffff880012d123c8
> > > > [74404.069009] Call Trace:
> > > > [74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130
> > > > [xen_gntdev]
> > > > [74404.069009]  [<ffffffff811e8c2e>]
> > > > mmu_notifier_unregister+0x4e/0xe0
> > > > [74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0
> > > > [xen_gntdev]
> > > > [74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
> > > > [74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
> > > > [74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
> > > > [74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
> > > > [74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
> > > > [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > > > 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3
> > > > 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66
> > > > 66 90 55
> > 
> > -- 
> > Best Regards,
> > Marek Marczykowski-Górecki
> > Invisible Things Lab
> > A: Because it messes up the order in which people normally read text.
> > Q: Why is top-posting such a bad thing?
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev)
  2015-06-22 18:13       ` Marek Marczykowski-Górecki
@ 2015-06-22 18:37         ` Konrad Rzeszutek Wilk
  2015-06-22 19:14           ` race condition in xen-gntdev Daniel De Graaf
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-22 18:37 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Boris Ostrovsky, dgdegra, David Vrabel, xen-devel

On Mon, Jun 22, 2015 at 08:13:35PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Jun 22, 2015 at 01:46:27PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jun 17, 2015 at 09:42:11PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:
> > > > On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > Hi,
> > > > > 
> > > > > What is the proper way to handle shared pages (either side - using
> > > > > gntdev or gntalloc) regarding fork and possible exec later? The child
> > > > > process do not need to access those pages in any way, but will map
> > > > > different one(s), using newly opened FD to the gntdev/gntalloc device.
> > > > > Should it unmap them and close FD to the device manually just after the
> > > > > fork? Or the process using gntdev or gntalloc should prevent using fork
> > > > > at all?
> > > > > 
> > > > > I'm asking because I get kernel oops[1] in context of such process. This
> > > > > process uses both gntdev and gntalloc. The PID reported there is a
> > > > > child, which maps additional pages (using newly opened FD to
> > > > > /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
> > > > > at this second mapping (actually vchan connection), or maybe even at
> > > > > cleanup of this second mapping. The parent process keeps its mappings
> > > > > for the whole lifetime of its child. I don't have a 100% reliable way
> > > > > to reproduce this problem, but it happens quite often when I run such
> > > > > operations in a loop.
> > > > 
> > > > Any ideas?
> > > 
> > > I've done some further debugging, below is what I get.
> > 
> > Woot! Thank you!
> > > 
> > > > > The kernel is vanilla 3.19.3, running on Xen 4.4.2.
> > > > > 
> > > > > The kernel message:
> > > > > [74376.073464] general protection fault: 0000 [#1] SMP 
> > > > > [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
> > > > > [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
> > > > > [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
> > > > > [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
> > > 
> > > 	static void unmap_if_in_range(struct grant_map *map,
> > > 			  unsigned long start, unsigned long end)
> > > 	{   
> > > 	    unsigned long mstart, mend;
> > > 	    int err;
> > > 
> > > 	    if (!map->vma)
> > > 		return;
> > > 
> > > The above crash is at first access to "map"...
> > > 
> > > > > [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
> > > > > [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
> > > > > [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
> > > 
> > > ... which is 0xdead000000100100 (LIST_POISON1).
> > > 
> > > 
> > > > > [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
> > > > > [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
> > > > > [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
> > > > > [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
> > > > > [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
> > > > > [74376.073582] Stack:
> > > > > [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
> > > > > [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
> > > > > [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
> > > > > [74376.073603] Call Trace:
> > > > > [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
> > > > > [74376.073620]  [<ffffffff811e88fb>]
> > > > > __mmu_notifier_invalidate_range_start+0x5b/0x90
> > > > > [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
> > > > > [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
> > > > > [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
> > > > > [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
> > > > > [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
> > > > > [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0
> > > > > [xen_evtchn]
> > > > > [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
> > > > > [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
> > > > > [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
> > > > > [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
> > > 
> > > 
> > > 	static void mn_invl_range_start(struct mmu_notifier *mn,
> > > 			struct mm_struct *mm,
> > > 			unsigned long start, unsigned long end)
> > > 	{
> > > 	    struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> > > 	    struct grant_map *map;
> > > 
> > > 	    spin_lock(&priv->lock);
> > > 	    list_for_each_entry(map, &priv->maps, next) {
> > > 		unmap_if_in_range(map, start, end);
> > > 	    }
> > > 
> > > mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so
> > > something is wrong with priv->maps list.
> > > 
> > > So I've searched for all the list_del calls on this list and found one not
> > > guarded by spinlock:
> > > 
> > > 	static int gntdev_release(struct inode *inode, struct file *flip)
> > > 	{   
> > > 	    struct gntdev_priv *priv = flip->private_data;
> > > 	    struct grant_map *map;
> > > 
> > > 	    pr_debug("priv %p\n", priv);
> > > 
> > > 	    while (!list_empty(&priv->maps)) {
> > > 		map = list_entry(priv->maps.next, struct grant_map, next);
> > > 		list_del(&map->next);
> > > 		gntdev_put_map(NULL /* already removed */, map);
> > > 	    }
> > > 	    WARN_ON(!list_empty(&priv->freeable_maps));
> > > 
> > > 	    if (use_ptemod)
> > > 		mmu_notifier_unregister(&priv->mn, priv->mm);
> > > 	    kfree(priv);
> > > 	    return 0;
> > > 	}   
> > > 
> > > So I see this as:
> > > P1(parent)                 P2(child)
> > >                            1. gntdev_release called
> > >                            2. list destroyed (above loop)
> > > 
> > > 3. page fault occurs, gntdev mmu notifier called
> > > 4. priv->lock taken
> > > 5. iterate over priv->maps
> > 
> > 
> > Could we check the map->users?
> 
> I don't have an easy way to reproduce the problem. It happens randomly
> once a day/two in some random domain. I've tried to fiddle with
> gnttab+fork+page faults, but failed to reliably reproduce the problem.
> 
> > > 6. crashed since map is already destroyed
> > 
> > Oh, indeed. gntdev_release calls gntdev_put_map which frees the whole
> > thing! Ouch.
> > > 
> > >                            7. mmu_notifier_unregister calls:
> > >                            8.   mn_release, which tries to take priv->lock
> > >                            9. this process hangs
> > > 
> > > So I'd guess the fix would be to move mmu_notifier_unregister before
> > > releasing priv->maps list. Could someone more familiar with this code
> > > confirm this?
> > 
> > Hey Marek,
> > 
> > Could we just add an lock around the 'gntdev_release' loop?
> 
> I don't fully understand what is going there, especially when
> gntdev_release is called (in case of forked process, still some memory
> mapped etc), and when mmu notifier is called. But just looking at this
> crash IMO adding a lock should be sufficient. What will happen when mmu notifier
> would not find the mapping? I guess nothing, because the mapping is just
> removed for a reason, right?

That was my thought. But then why didn't we add this lock in the first place?
Hopefully Daniel can remember ~2yr patch :-)

> 
> > Daniel added this code to guard against a similar race:
> > 
> > commit 16a1d0225e22e4e273e6b60a21db95decde666c2
> > Author: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Date:   Wed Jan 2 22:57:12 2013 +0000
> > 
> >     xen/gntdev: correctly unmap unlinked maps in mmu notifier
> >     
> >     If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping
> >     it, the entry is removed from priv->maps and the later call to
> >     mn_invl_range_start won't find it to do the unmapping. Fix this by
> >     creating another list of freeable maps that the mmu notifier can search
> >     and use to unmap grants.
> >     
> >     Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > 
> > Lets see what he thinks.
> > > 
> > > > > [74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f
> > > > > 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb
> > > > > 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c
> > > > > 8b 48 08
> > > > > [74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0
> > > > > [xen_gntdev]
> > > > > [74376.074031]  RSP <ffff88000032fc08>
> > > > > [74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
> > > 
> > > And the other process:
> > > 
> > > > > [74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
> > > > > [qrexec-agent:9379]
> > > > > [74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE
> > > > > nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> > > > > nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables
> > > > > intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul
> > > > > crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel
> > > > > nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd
> > > > > udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O)
> > > > > xen_evtchn xen_blkfront
> > > > > [74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D
> > > > > O   3.19.3-4.pvops.qubes.x86_64 #1
> > > > > [74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti:
> > > > > ffff880002470000
> > > > > [74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>]
> > > > > _raw_spin_lock+0x21/0x30
> > > > > [74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
> > > > > [74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX:
> > > > > 0000000000018cf8
> > > > > [74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI:
> > > > > ffff880012702e60
> > > > > [74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09:
> > > > > 00000001802a0019
> > > > > [74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12:
> > > > > ffff880012702e70
> > > > > [74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15:
> > > > > ffff880012b163c0
> > > > > [74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000)
> > > > > knlGS:0000000000000000
> > > > > [74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4:
> > > > > 0000000000042660
> > > > > [74404.069009] Stack:
> > > > > [74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48
> > > > > ffff880002345c00
> > > > > [74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78
> > > > > ffffffff811e8c2e
> > > > > [74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40
> > > > > ffff880012d123c8
> > > > > [74404.069009] Call Trace:
> > > > > [74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130
> > > > > [xen_gntdev]
> > > > > [74404.069009]  [<ffffffff811e8c2e>]
> > > > > mmu_notifier_unregister+0x4e/0xe0
> > > > > [74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0
> > > > > [xen_gntdev]
> > > > > [74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
> > > > > [74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
> > > > > [74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
> > > > > [74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
> > > > > [74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
> > > > > [74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > > > > 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3
> > > > > 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66
> > > > > 66 90 55
> > > 
> > > -- 
> > > Best Regards,
> > > Marek Marczykowski-Górecki
> > > Invisible Things Lab
> > > A: Because it messes up the order in which people normally read text.
> > > Q: Why is top-posting such a bad thing?
> > 
> > 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-22 18:37         ` Konrad Rzeszutek Wilk
@ 2015-06-22 19:14           ` Daniel De Graaf
  2015-06-26  1:28             ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel De Graaf @ 2015-06-22 19:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Marek Marczykowski-Górecki
  Cc: Boris Ostrovsky, David Vrabel, xen-devel

On 06/22/2015 02:37 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 22, 2015 at 08:13:35PM +0200, Marek Marczykowski-Górecki wrote:
>> On Mon, Jun 22, 2015 at 01:46:27PM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Jun 17, 2015 at 09:42:11PM +0200, Marek Marczykowski-Górecki wrote:
>>>> On Thu, May 28, 2015 at 01:45:08AM +0200, Marek Marczykowski-Górecki wrote:
>>>>> On Thu, Apr 30, 2015 at 04:47:44PM +0200, Marek Marczykowski-Górecki wrote:
>>>>>> Hi,
>>>>>>
>>>>>> What is the proper way to handle shared pages (either side - using
>>>>>> gntdev or gntalloc) regarding fork and possible exec later? The child
>>>>>> process do not need to access those pages in any way, but will map
>>>>>> different one(s), using newly opened FD to the gntdev/gntalloc device.
>>>>>> Should it unmap them and close FD to the device manually just after the
>>>>>> fork? Or the process using gntdev or gntalloc should prevent using fork
>>>>>> at all?
>>>>>>
>>>>>> I'm asking because I get kernel oops[1] in context of such process. This
>>>>>> process uses both gntdev and gntalloc. The PID reported there is a
>>>>>> child, which maps additional pages (using newly opened FD to
>>>>>> /dev/xen/gnt*), but I'm not sure if the crash happens before, after or
>>>>>> at this second mapping (actually vchan connection), or maybe even at
>>>>>> cleanup of this second mapping. The parent process keeps its mappings
>>>>>> for the whole lifetime of its child. I don't have a 100% reliable way
>>>>>> to reproduce this problem, but it happens quite often when I run such
>>>>>> operations in a loop.
>>>>>
>>>>> Any ideas?
>>>>
>>>> I've done some further debugging, below is what I get.
>>>
>>> Woot! Thank you!
>>>>
>>>>>> The kernel is vanilla 3.19.3, running on Xen 4.4.2.
>>>>>>
>>>>>> The kernel message:
>>>>>> [74376.073464] general protection fault: 0000 [#1] SMP
>>>>>> [74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
>>>>>> [74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
>>>>>> [74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
>>>>>> [74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
>>>>
>>>> 	static void unmap_if_in_range(struct grant_map *map,
>>>> 			  unsigned long start, unsigned long end)
>>>> 	{
>>>> 	    unsigned long mstart, mend;
>>>> 	    int err;
>>>>
>>>> 	    if (!map->vma)
>>>> 		return;
>>>>
>>>> The above crash is at first access to "map"...
>>>>
>>>>>> [74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
>>>>>> [74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
>>>>>> [74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
>>>>
>>>> ... which is 0xdead000000100100 (LIST_POISON1).
>>>>
>>>>
>>>>>> [74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
>>>>>> [74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
>>>>>> [74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
>>>>>> [74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
>>>>>> [74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
>>>>>> [74376.073582] Stack:
>>>>>> [74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
>>>>>> [74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
>>>>>> [74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
>>>>>> [74376.073603] Call Trace:
>>>>>> [74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
>>>>>> [74376.073620]  [<ffffffff811e88fb>]
>>>>>> __mmu_notifier_invalidate_range_start+0x5b/0x90
>>>>>> [74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
>>>>>> [74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
>>>>>> [74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
>>>>>> [74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
>>>>>> [74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
>>>>>> [74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0
>>>>>> [xen_evtchn]
>>>>>> [74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
>>>>>> [74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
>>>>>> [74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
>>>>>> [74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
>>>>
>>>>
>>>> 	static void mn_invl_range_start(struct mmu_notifier *mn,
>>>> 			struct mm_struct *mm,
>>>> 			unsigned long start, unsigned long end)
>>>> 	{
>>>> 	    struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
>>>> 	    struct grant_map *map;
>>>>
>>>> 	    spin_lock(&priv->lock);
>>>> 	    list_for_each_entry(map, &priv->maps, next) {
>>>> 		unmap_if_in_range(map, start, end);
>>>> 	    }
>>>>
>>>> mn_invl_range_start+0x46 is the first call to unmap_if_in_range, so
>>>> something is wrong with priv->maps list.
>>>>
>>>> So I've searched for all the list_del calls on this list and found one not
>>>> guarded by spinlock:
>>>>
>>>> 	static int gntdev_release(struct inode *inode, struct file *flip)
>>>> 	{
>>>> 	    struct gntdev_priv *priv = flip->private_data;
>>>> 	    struct grant_map *map;
>>>>
>>>> 	    pr_debug("priv %p\n", priv);
>>>>
>>>> 	    while (!list_empty(&priv->maps)) {
>>>> 		map = list_entry(priv->maps.next, struct grant_map, next);
>>>> 		list_del(&map->next);
>>>> 		gntdev_put_map(NULL /* already removed */, map);
>>>> 	    }
>>>> 	    WARN_ON(!list_empty(&priv->freeable_maps));
>>>>
>>>> 	    if (use_ptemod)
>>>> 		mmu_notifier_unregister(&priv->mn, priv->mm);
>>>> 	    kfree(priv);
>>>> 	    return 0;
>>>> 	}
>>>>
>>>> So I see this as:
>>>> P1(parent)                 P2(child)
>>>>                             1. gntdev_release called
>>>>                             2. list destroyed (above loop)
>>>>
>>>> 3. page fault occurs, gntdev mmu notifier called
>>>> 4. priv->lock taken
>>>> 5. iterate over priv->maps
>>>
>>>
>>> Could we check the map->users?
>>
>> I don't have an easy way to reproduce the problem. It happens randomly
>> once a day/two in some random domain. I've tried to fiddle with
>> gnttab+fork+page faults, but failed to reliably reproduce the problem.
>>
>>>> 6. crashed since map is already destroyed
>>>
>>> Oh, indeed. gntdev_release calls gntdev_put_map which frees the whole
>>> thing! Ouch.
>>>>
>>>>                             7. mmu_notifier_unregister calls:
>>>>                             8.   mn_release, which tries to take priv->lock
>>>>                             9. this process hangs
>>>>
>>>> So I'd guess the fix would be to move mmu_notifier_unregister before
>>>> releasing priv->maps list. Could someone more familiar with this code
>>>> confirm this?
>>>
>>> Hey Marek,
>>>
>>> Could we just add an lock around the 'gntdev_release' loop?
>>
>> I don't fully understand what is going there, especially when
>> gntdev_release is called (in case of forked process, still some memory
>> mapped etc), and when mmu notifier is called. But just looking at this
>> crash IMO adding a lock should be sufficient. What will happen when mmu notifier
>> would not find the mapping? I guess nothing, because the mapping is just
>> removed for a reason, right?
>
> That was my thought. But then why didn't we add this lock in the first place?
> Hopefully Daniel can remember ~2yr patch :-)

The reason that gntdev_release didn't have a lock is because there are not
supposed to be any references to the areas pointed to by priv->maps when it
is called.  However, since the MMU notifier has not yet been unregistered,
it is apparently possible to race here; the comment on mmu_notifier_unregister
seems to confirm this as a possibility (as do the backtraces).

I think adding the lock will be sufficient.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-22 19:14           ` race condition in xen-gntdev Daniel De Graaf
@ 2015-06-26  1:28             ` Marek Marczykowski-Górecki
  2015-06-29 14:39               ` Konrad Rzeszutek Wilk
  2015-06-30 15:26               ` David Vrabel
  0 siblings, 2 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-06-26  1:28 UTC (permalink / raw)
  To: Daniel De Graaf; +Cc: Boris Ostrovsky, xen-devel, David Vrabel


[-- Attachment #1.1: Type: text/plain, Size: 2111 bytes --]

On Mon, Jun 22, 2015 at 03:14:16PM -0400, Daniel De Graaf wrote:
> The reason that gntdev_release didn't have a lock is because there are not
> supposed to be any references to the areas pointed to by priv->maps when it
> is called.  However, since the MMU notifier has not yet been unregistered,
> it is apparently possible to race here; the comment on mmu_notifier_unregister
> seems to confirm this as a possibility (as do the backtraces).
> 
> I think adding the lock will be sufficient.

Ok, so here is the patch:

-----------8<------------

From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
 <marmarek@invisiblethingslab.com>
Date: Fri, 26 Jun 2015 02:16:49 +0200
Subject: [PATCH] xen/grant: fix race condition in gntdev_release

While gntdev_release is called, MMU notifier is still registered and
can traverse priv->maps list even if no pages are mapped (which is the
case - gntdev_release is called after all). But gntdev_release will
clear that list, so make sure that only one of those things happens at
the same time.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 drivers/xen/gntdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 8927485..4bd23bb 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
 
 	pr_debug("priv %p\n", priv);
 
+	mutex_lock(&priv->lock);
 	while (!list_empty(&priv->maps)) {
 		map = list_entry(priv->maps.next, struct grant_map, next);
 		list_del(&map->next);
 		gntdev_put_map(NULL /* already removed */, map);
 	}
 	WARN_ON(!list_empty(&priv->freeable_maps));
+	mutex_unlock(&priv->lock);
 
 	if (use_ptemod)
 		mmu_notifier_unregister(&priv->mn, priv->mm);
-- 
1.9.3


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-26  1:28             ` Marek Marczykowski-Górecki
@ 2015-06-29 14:39               ` Konrad Rzeszutek Wilk
  2015-06-29 14:50                 ` Marek Marczykowski-Górecki
  2015-06-30 15:26               ` David Vrabel
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-29 14:39 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Boris Ostrovsky, Daniel De Graaf, David Vrabel, xen-devel

On Fri, Jun 26, 2015 at 03:28:24AM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Jun 22, 2015 at 03:14:16PM -0400, Daniel De Graaf wrote:
> > The reason that gntdev_release didn't have a lock is because there are not
> > supposed to be any references to the areas pointed to by priv->maps when it
> > is called.  However, since the MMU notifier has not yet been unregistered,
> > it is apparently possible to race here; the comment on mmu_notifier_unregister
> > seems to confirm this as a possibility (as do the backtraces).
> > 
> > I think adding the lock will be sufficient.
> 
> Ok, so here is the patch:

Awesome!

Since you are the one who has been seeing this particular fault - any chance
you could give it some soak time? If I recall your emails correctly it takes
about a week or so before you saw the crash?

> 
> -----------8<------------
> 
> From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
>  <marmarek@invisiblethingslab.com>
> Date: Fri, 26 Jun 2015 02:16:49 +0200
> Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> 
> While gntdev_release is called, MMU notifier is still registered and
> can traverse priv->maps list even if no pages are mapped (which is the
> case - gntdev_release is called after all). But gntdev_release will
> clear that list, so make sure that only one of those things happens at
> the same time.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
>  drivers/xen/gntdev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 8927485..4bd23bb 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>  
>  	pr_debug("priv %p\n", priv);
>  
> +	mutex_lock(&priv->lock);
>  	while (!list_empty(&priv->maps)) {
>  		map = list_entry(priv->maps.next, struct grant_map, next);
>  		list_del(&map->next);
>  		gntdev_put_map(NULL /* already removed */, map);
>  	}
>  	WARN_ON(!list_empty(&priv->freeable_maps));
> +	mutex_unlock(&priv->lock);
>  
>  	if (use_ptemod)
>  		mmu_notifier_unregister(&priv->mn, priv->mm);
> -- 
> 1.9.3
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-29 14:39               ` Konrad Rzeszutek Wilk
@ 2015-06-29 14:50                 ` Marek Marczykowski-Górecki
  2015-07-22  3:21                   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-06-29 14:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Boris Ostrovsky, Daniel De Graaf, David Vrabel, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3007 bytes --]

On Mon, Jun 29, 2015 at 10:39:26AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 26, 2015 at 03:28:24AM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Jun 22, 2015 at 03:14:16PM -0400, Daniel De Graaf wrote:
> > > The reason that gntdev_release didn't have a lock is because there are not
> > > supposed to be any references to the areas pointed to by priv->maps when it
> > > is called.  However, since the MMU notifier has not yet been unregistered,
> > > it is apparently possible to race here; the comment on mmu_notifier_unregister
> > > seems to confirm this as a possibility (as do the backtraces).
> > > 
> > > I think adding the lock will be sufficient.
> > 
> > Ok, so here is the patch:
> 
> Awesome!
> 
> Since you are the one who has been seeing this particular fault - any chance
> you could give it some soak time? If I recall your emails correctly it takes
> about a week or so before you saw the crash?

Sure. I've already installed patched kernel, will report back results
later.

> 
> > 
> > -----------8<------------
> > 
> > From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> >  <marmarek@invisiblethingslab.com>
> > Date: Fri, 26 Jun 2015 02:16:49 +0200
> > Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> > 
> > While gntdev_release is called, MMU notifier is still registered and
> > can traverse priv->maps list even if no pages are mapped (which is the
> > case - gntdev_release is called after all). But gntdev_release will
> > clear that list, so make sure that only one of those things happens at
> > the same time.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> >  drivers/xen/gntdev.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index 8927485..4bd23bb 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
> >  
> >  	pr_debug("priv %p\n", priv);
> >  
> > +	mutex_lock(&priv->lock);
> >  	while (!list_empty(&priv->maps)) {
> >  		map = list_entry(priv->maps.next, struct grant_map, next);
> >  		list_del(&map->next);
> >  		gntdev_put_map(NULL /* already removed */, map);
> >  	}
> >  	WARN_ON(!list_empty(&priv->freeable_maps));
> > +	mutex_unlock(&priv->lock);
> >  
> >  	if (use_ptemod)
> >  		mmu_notifier_unregister(&priv->mn, priv->mm);
> > -- 
> > 1.9.3
> > 
> > 
> > -- 
> > Best Regards,
> > Marek Marczykowski-Górecki
> > Invisible Things Lab
> > A: Because it messes up the order in which people normally read text.
> > Q: Why is top-posting such a bad thing?
> 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-26  1:28             ` Marek Marczykowski-Górecki
  2015-06-29 14:39               ` Konrad Rzeszutek Wilk
@ 2015-06-30 15:26               ` David Vrabel
  2015-06-30 15:51                 ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 14+ messages in thread
From: David Vrabel @ 2015-06-30 15:26 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Daniel De Graaf
  Cc: Boris Ostrovsky, David Vrabel, xen-devel

On 26/06/15 02:28, Marek Marczykowski-Górecki wrote:
> 
> From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
>  <marmarek@invisiblethingslab.com>
> Date: Fri, 26 Jun 2015 02:16:49 +0200
> Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> 
> While gntdev_release is called, MMU notifier is still registered and
> can traverse priv->maps list even if no pages are mapped (which is the
> case - gntdev_release is called after all). But gntdev_release will
> clear that list, so make sure that only one of those things happens at
> the same time.

Applied to for-linus-4.2 and tagged for stable, thanks.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-30 15:26               ` David Vrabel
@ 2015-06-30 15:51                 ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-06-30 15:51 UTC (permalink / raw)
  To: David Vrabel; +Cc: Daniel De Graaf, Boris Ostrovsky, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1126 bytes --]

On Tue, Jun 30, 2015 at 04:26:45PM +0100, David Vrabel wrote:
> On 26/06/15 02:28, Marek Marczykowski-Górecki wrote:
> > 
> > From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> >  <marmarek@invisiblethingslab.com>
> > Date: Fri, 26 Jun 2015 02:16:49 +0200
> > Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> > 
> > While gntdev_release is called, MMU notifier is still registered and
> > can traverse priv->maps list even if no pages are mapped (which is the
> > case - gntdev_release is called after all). But gntdev_release will
> > clear that list, so make sure that only one of those things happens at
> > the same time.
> 
> Applied to for-linus-4.2 and tagged for stable, thanks.

Regarding stable backports - remember that before 4.0, there was a spin
lock not a mutex. I've stumbled on this while testing on 3.18...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-06-29 14:50                 ` Marek Marczykowski-Górecki
@ 2015-07-22  3:21                   ` Marek Marczykowski-Górecki
  2015-07-22 13:58                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-07-22  3:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Boris Ostrovsky, Daniel De Graaf, David Vrabel, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3293 bytes --]

On Mon, Jun 29, 2015 at 04:50:10PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Jun 29, 2015 at 10:39:26AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jun 26, 2015 at 03:28:24AM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Jun 22, 2015 at 03:14:16PM -0400, Daniel De Graaf wrote:
> > > > The reason that gntdev_release didn't have a lock is because there are not
> > > > supposed to be any references to the areas pointed to by priv->maps when it
> > > > is called.  However, since the MMU notifier has not yet been unregistered,
> > > > it is apparently possible to race here; the comment on mmu_notifier_unregister
> > > > seems to confirm this as a possibility (as do the backtraces).
> > > > 
> > > > I think adding the lock will be sufficient.
> > > 
> > > Ok, so here is the patch:
> > 
> > Awesome!
> > 
> > Since you are the one who has been seeing this particular fault - any chance
> > you could give it some soak time? If I recall your emails correctly it takes
> > about a week or so before you saw the crash?
> 
> Sure. I've already installed patched kernel, will report back results
> later.

Ok, after few weeks I can surely confirm - this fixes the issue.

> > > -----------8<------------
> > > 
> > > From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> > > From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> > >  <marmarek@invisiblethingslab.com>
> > > Date: Fri, 26 Jun 2015 02:16:49 +0200
> > > Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> > > 
> > > While gntdev_release is called, MMU notifier is still registered and
> > > can traverse priv->maps list even if no pages are mapped (which is the
> > > case - gntdev_release is called after all). But gntdev_release will
> > > clear that list, so make sure that only one of those things happens at
> > > the same time.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > ---
> > >  drivers/xen/gntdev.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > > index 8927485..4bd23bb 100644
> > > --- a/drivers/xen/gntdev.c
> > > +++ b/drivers/xen/gntdev.c
> > > @@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
> > >  
> > >  	pr_debug("priv %p\n", priv);
> > >  
> > > +	mutex_lock(&priv->lock);
> > >  	while (!list_empty(&priv->maps)) {
> > >  		map = list_entry(priv->maps.next, struct grant_map, next);
> > >  		list_del(&map->next);
> > >  		gntdev_put_map(NULL /* already removed */, map);
> > >  	}
> > >  	WARN_ON(!list_empty(&priv->freeable_maps));
> > > +	mutex_unlock(&priv->lock);
> > >  
> > >  	if (use_ptemod)
> > >  		mmu_notifier_unregister(&priv->mn, priv->mm);
> > > -- 
> > > 1.9.3
> > > 
> > > 
> > > -- 
> > > Best Regards,
> > > Marek Marczykowski-Górecki
> > > Invisible Things Lab
> > > A: Because it messes up the order in which people normally read text.
> > > Q: Why is top-posting such a bad thing?
> > 
> > 
> 



-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: race condition in xen-gntdev
  2015-07-22  3:21                   ` Marek Marczykowski-Górecki
@ 2015-07-22 13:58                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-07-22 13:58 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Boris Ostrovsky, Daniel De Graaf, David Vrabel, xen-devel

On Wed, Jul 22, 2015 at 05:21:55AM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Jun 29, 2015 at 04:50:10PM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Jun 29, 2015 at 10:39:26AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jun 26, 2015 at 03:28:24AM +0200, Marek Marczykowski-Górecki wrote:
> > > > On Mon, Jun 22, 2015 at 03:14:16PM -0400, Daniel De Graaf wrote:
> > > > > The reason that gntdev_release didn't have a lock is because there are not
> > > > > supposed to be any references to the areas pointed to by priv->maps when it
> > > > > is called.  However, since the MMU notifier has not yet been unregistered,
> > > > > it is apparently possible to race here; the comment on mmu_notifier_unregister
> > > > > seems to confirm this as a possibility (as do the backtraces).
> > > > > 
> > > > > I think adding the lock will be sufficient.
> > > > 
> > > > Ok, so here is the patch:
> > > 
> > > Awesome!
> > > 
> > > Since you are the one who has been seeing this particular fault - any chance
> > > you could give it some soak time? If I recall your emails correctly it takes
> > > about a week or so before you saw the crash?
> > 
> > Sure. I've already installed patched kernel, will report back results
> > later.
> 
> Ok, after few weeks I can surely confirm - this fixes the issue.

Fantastic! I think David has already committed it in.


Thanks again for doing such a detailed analysis of the issue.
> 
> > > > -----------8<------------
> > > > 
> > > > From b876e14888bdafa112c3265e6420543fa74aa709 Mon Sep 17 00:00:00 2001
> > > > From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
> > > >  <marmarek@invisiblethingslab.com>
> > > > Date: Fri, 26 Jun 2015 02:16:49 +0200
> > > > Subject: [PATCH] xen/grant: fix race condition in gntdev_release
> > > > 
> > > > While gntdev_release is called, MMU notifier is still registered and
> > > > can traverse priv->maps list even if no pages are mapped (which is the
> > > > case - gntdev_release is called after all). But gntdev_release will
> > > > clear that list, so make sure that only one of those things happens at
> > > > the same time.
> > > > 
> > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > ---
> > > >  drivers/xen/gntdev.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > > > index 8927485..4bd23bb 100644
> > > > --- a/drivers/xen/gntdev.c
> > > > +++ b/drivers/xen/gntdev.c
> > > > @@ -568,12 +568,14 @@ static int gntdev_release(struct inode *inode, struct file *flip)
> > > >  
> > > >  	pr_debug("priv %p\n", priv);
> > > >  
> > > > +	mutex_lock(&priv->lock);
> > > >  	while (!list_empty(&priv->maps)) {
> > > >  		map = list_entry(priv->maps.next, struct grant_map, next);
> > > >  		list_del(&map->next);
> > > >  		gntdev_put_map(NULL /* already removed */, map);
> > > >  	}
> > > >  	WARN_ON(!list_empty(&priv->freeable_maps));
> > > > +	mutex_unlock(&priv->lock);
> > > >  
> > > >  	if (use_ptemod)
> > > >  		mmu_notifier_unregister(&priv->mn, priv->mm);
> > > > -- 
> > > > 1.9.3
> > > > 
> > > > 
> > > > -- 
> > > > Best Regards,
> > > > Marek Marczykowski-Górecki
> > > > Invisible Things Lab
> > > > A: Because it messes up the order in which people normally read text.
> > > > Q: Why is top-posting such a bad thing?
> > > 
> > > 
> > 
> 
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-07-22 13:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 14:47 gntdev/gntalloc and fork Marek Marczykowski-Górecki
2015-05-27 23:45 ` gntdev/gntalloc and fork? - crash in gntdev Marek Marczykowski-Górecki
2015-06-17 19:42   ` race condition in xen-gntdev (was: Re: gntdev/gntalloc and fork? - crash in gntdev) Marek Marczykowski-Górecki
2015-06-22 17:46     ` Konrad Rzeszutek Wilk
2015-06-22 18:13       ` Marek Marczykowski-Górecki
2015-06-22 18:37         ` Konrad Rzeszutek Wilk
2015-06-22 19:14           ` race condition in xen-gntdev Daniel De Graaf
2015-06-26  1:28             ` Marek Marczykowski-Górecki
2015-06-29 14:39               ` Konrad Rzeszutek Wilk
2015-06-29 14:50                 ` Marek Marczykowski-Górecki
2015-07-22  3:21                   ` Marek Marczykowski-Górecki
2015-07-22 13:58                     ` Konrad Rzeszutek Wilk
2015-06-30 15:26               ` David Vrabel
2015-06-30 15:51                 ` Marek Marczykowski-Górecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).