gntdev/gntalloc and fork

* gntdev/gntalloc and fork
@ 2015-04-30 14:47 Marek Marczykowski-Górecki
  2015-05-27 23:45 ` gntdev/gntalloc and fork? - crash in gntdev Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-04-30 14:47 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 6965 bytes --]

Hi,

What is the proper way to handle shared pages (either side - using
gntdev or gntalloc) regarding fork and possible exec later? The child
process do not need to access those pages in any way, but will map
different one(s), using newly opened FD to the gntdev/gntalloc device.
Should it unmap them and close FD to the device manually just after the
fork? Or the process using gntdev or gntalloc should prevent using fork
at all?

I'm asking because I get kernel oops[1] in context of such process. This
process uses both gntdev and gntalloc. The PID reported there is a
child, which maps additional pages (using newly opened FD to
/dev/xen/gnt*), but I'm not sure if the crash happens before, after or
at this second mapping (actually vchan connection), or maybe even at
cleanup of this second mapping. The parent process keeps its mappings
for the whole lifetime of its child.  I don't have a 100% reliable way
to reproduce this problem, but it happens quite often when I run such
operations in a loop.

The kernel is vanilla 3.19.3, running on Xen 4.4.2.

The kernel message:
[74376.073464] general protection fault: 0000 [#1] SMP 
[74376.073475] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
[74376.073522] CPU: 1 PID: 9377 Comm: qrexec-agent Tainted: G           O   3.19.3-4.pvops.qubes.x86_64 #1
[74376.073528] task: ffff880002442e40 ti: ffff88000032c000 task.ti: ffff88000032c000
[74376.073532] RIP: e030:[<ffffffffa00952c5>]  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
[74376.073543] RSP: e02b:ffff88000032fc08  EFLAGS: 00010292
[74376.073546] RAX: 0000000000000000 RBX: dead000000100100 RCX: 00007fd8616ea000
[74376.073550] RDX: 00007fd8616ea000 RSI: 00007fd8616e9000 RDI: dead000000100100
[74376.073554] RBP: ffff88000032fc48 R08: 0000000000000000 R09: 0000000000000000
[74376.073557] R10: ffffea000021bb00 R11: 0000000000000000 R12: 00007fd8616e9000
[74376.073561] R13: 00007fd8616ea000 R14: ffff880012702e40 R15: ffff880012702e70
[74376.073569] FS:  00007fd8616ca700(0000) GS:ffff880013c80000(0000) knlGS:0000000000000000
[74376.073574] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[74376.073577] CR2: 00007fd8616e9458 CR3: 00000000e7af5000 CR4: 0000000000042660
[74376.073582] Stack:
[74376.073584]  ffff8800188356c0 00000000000000d0 ffff88000032fc68 00000000c64ef797
[74376.073590]  0000000000000220 dead000000100100 00007fd8616e9000 00007fd8616ea000
[74376.073596]  ffff88000032fc88 ffffffffa00953c6 ffff88000032fcc8 ffff880012702e70
[74376.073603] Call Trace:
[74376.073610]  [<ffffffffa00953c6>] mn_invl_range_start+0x46/0x90 [xen_gntdev]
[74376.073620]  [<ffffffff811e88fb>] __mmu_notifier_invalidate_range_start+0x5b/0x90
[74376.073627]  [<ffffffff811c2a59>] do_wp_page+0x769/0x820
[74376.074031]  [<ffffffff811c4f5c>] handle_mm_fault+0x7fc/0x10c0
[74376.074031]  [<ffffffff813864cd>] ? radix_tree_lookup+0xd/0x10
[74376.074031]  [<ffffffff81061e1c>] __do_page_fault+0x1dc/0x5a0
[74376.074031]  [<ffffffff817560a6>] ? mutex_lock+0x16/0x37
[74376.074031]  [<ffffffffa0008928>] ? evtchn_ioctl+0x118/0x3c0 [xen_evtchn]
[74376.074031]  [<ffffffff812209d8>] ? do_vfs_ioctl+0x2f8/0x4f0
[74376.074031]  [<ffffffff811cafdf>] ? do_munmap+0x29f/0x3b0
[74376.074031]  [<ffffffff81062211>] do_page_fault+0x31/0x70
[74376.074031]  [<ffffffff81759e28>] page_fault+0x28/0x30
[74376.074031] Code: e9 dd fd ff ff 31 c9 31 db e9 20 fe ff ff 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 28 <48> 8b 47 10 48 85 c0 74 4e 4c 8b 00 49 39 d0 73 46 4c 8b 48 08
[74376.074031] RIP  [<ffffffffa00952c5>] unmap_if_in_range+0x15/0xd0 [xen_gntdev]
[74376.074031]  RSP <ffff88000032fc08>
[74376.091682] ---[ end trace 2b21c5b714eb1071 ]---
[74404.069009] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [qrexec-agent:9379]
[74404.069009] Modules linked in: fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul crc32c_intel pcspkr xen_netfront ghash_clmulni_intel nfsd auth_rpcgss nfs_acl lockd grace xenfs xen_privcmd dummy_hcd udc_core xen_gntdev xen_gntalloc xen_blkback sunrpc u2mfn(O) xen_evtchn xen_blkfront
[74404.069009] CPU: 2 PID: 9379 Comm: qrexec-agent Tainted: G      D    O   3.19.3-4.pvops.qubes.x86_64 #1
[74404.069009] task: ffff880010e24a00 ti: ffff880002470000 task.ti: ffff880002470000
[74404.069009] RIP: e030:[<ffffffff81757b11>]  [<ffffffff81757b11>] _raw_spin_lock+0x21/0x30
[74404.069009] RSP: e02b:ffff880002473e18  EFLAGS: 00000297
[74404.069009] RAX: 0000000000000040 RBX: ffff880002345c00 RCX: 0000000000018cf8
[74404.069009] RDX: 0000000000000041 RSI: ffff880002345c00 RDI: ffff880012702e60
[74404.069009] RBP: ffff880002473e18 R08: ffff880012702240 R09: 00000001802a0019
[74404.069009] R10: ffffea000049c080 R11: ffffffffa00955bf R12: ffff880012702e70
[74404.069009] R13: ffff880012702e40 R14: ffff8800132c6f20 R15: ffff880012b163c0
[74404.069009] FS:  00007fd8616ca700(0000) GS:ffff880013d00000(0000) knlGS:0000000000000000
[74404.069009] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[74404.069009] CR2: 00007fd8610be098 CR3: 000000000b971000 CR4: 0000000000042660
[74404.069009] Stack:
[74404.069009]  ffff880002473e48 ffffffffa0095452 ffff880002473e48 ffff880002345c00
[74404.069009]  ffff880012702e70 0000000000000000 ffff880002473e78 ffffffff811e8c2e
[74404.069009]  ffff880002473e78 ffff880012702e40 ffff880012702e40 ffff880012d123c8
[74404.069009] Call Trace:
[74404.069009]  [<ffffffffa0095452>] mn_release+0x22/0x130 [xen_gntdev]
[74404.069009]  [<ffffffff811e8c2e>] mmu_notifier_unregister+0x4e/0xe0
[74404.069009]  [<ffffffffa00957c0>] gntdev_release+0x60/0xa0 [xen_gntdev]
[74404.069009]  [<ffffffff8120ec0f>] __fput+0xdf/0x1e0
[74404.069009]  [<ffffffff8120ed5e>] ____fput+0xe/0x10
[74404.069009]  [<ffffffff810b56df>] task_work_run+0xbf/0x100
[74404.069009]  [<ffffffff81014c47>] do_notify_resume+0x97/0xb0
[74404.069009]  [<ffffffff81758127>] int_signal+0x12/0x17
[74404.069009] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 75 04 5d c3 f3 90 0f b6 07 <38> d0 75 f7 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread