* xen-gntdev gets stuck unmapping a 2nd page
@ 2014-09-05 21:04 Dave Scott
2014-09-10 12:41 ` David Vrabel
0 siblings, 1 reply; 4+ messages in thread
From: Dave Scott @ 2014-09-05 21:04 UTC (permalink / raw)
To: xen-devel; +Cc: John Else, David Vrabel, Anil Madhavapeddy
Hi,
I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
It works ok if I map/unmap in a single page. If I increase the buffer size
and cause vchan to map a second page (non-contiguous mappings via 2 distinct
libxc calls), the process gets stuck in disk sleep when unmapping the second
page. The same happens if I comment out the unmap and let the program exit.
I'm using a 3.13 kernel:
$ uname -a
Linux ubuntu 3.13.0-24-generic #46 SMP Thu Aug 28 23:05:20 BST 2014 x86_64 x86_64 x86_64 GNU/Linux
Although I'm testing vchan between two domUs, I believe this repros it
with loopback grants within a single domU:
$ cat > test-gnt.c <<EOT
#include <xenctrl.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
void main(int argc, char* argv[]){
int my_domid = atoi(argv[1]);
uint32_t refs1, refs2;
void *share1, *share2, *map1, *map2;
int count = 1;
xc_gntshr *xshr = xc_gntshr_open(NULL, 0);
xc_gnttab *xtab = xc_gnttab_open(NULL, 0);
if (!xshr || !xtab)
goto fail;
share1 = xc_gntshr_share_pages(xshr, my_domid, count, &refs1, 1);
share2 = xc_gntshr_share_pages(xshr, my_domid, count, &refs2, 1);
if (!share1 || !share2)
goto fail;
map1 = xc_gnttab_map_grant_ref(xtab, my_domid, refs1, PROT_READ);
map2 = xc_gnttab_map_grant_ref(xtab, my_domid, refs2, PROT_READ);
fprintf(stderr, "src=%p ref=%"PRIu32" dest=%p\n", share1, refs1, map1);
fprintf(stderr, "src=%p ref=%"PRIu32" dest=%p\n", share2, refs2, map2);
xc_gnttab_munmap(xtab, map2, count);
fprintf(stderr, "Unmapped first page\n"); fflush(stderr);
/* This call never completes: */
xc_gnttab_munmap(xtab, map1, count);
fprintf(stderr, "Unmapped second page\n"); fflush(stderr);
exit(0);
fail:
perror(NULL);
exit(1);
}
EOT
$ gcc -o test-gnt test-gnt.c -lxenctrl
$ sudo ./test-gnt $(sudo xenstore-read domid)
src=0x7f74080bd000 ref=54 dest=0x7f74080bb000
src=0x7f74080bc000 ref=85 dest=0x7f74080ba000
Unmapped first page
<now it's dead>
>From the logs it looks like free_xenballooned_pages blocks forever:
[ 720.176089] INFO: task kworker/0:1:27 blocked for more than 120 seconds.
[ 720.176101] Not tainted 3.13.0-24-generic #46
[ 720.176105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 720.176110] kworker/0:1 D 0000000000000000 0 27 2 0x00000000
[ 720.176118] Workqueue: events balloon_process
[ 720.176120] ffff88003db85b20 0000000000000202 ffff88003db597f0 ffff88003db85fd8
[ 720.176123] 0000000000014440 0000000000014440 ffff88003db597f0 ffff88003c6cfc98
[ 720.176125] ffff88003c6cfc9c ffff88003db597f0 00000000ffffffff ffff88003c6cfca0
[ 720.176127] Call Trace:
[ 720.176133] [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[ 720.176136] [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[ 720.176139] [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[ 720.176142] [<ffffffff8148e92d>] device_attach+0x1d/0xa0
[ 720.176145] [<ffffffff8148dda8>] bus_probe_device+0x98/0xc0
[ 720.176147] [<ffffffff8148bc05>] device_add+0x4c5/0x640
[ 720.176149] [<ffffffff8148bd9a>] device_register+0x1a/0x20
[ 720.176158] [<ffffffff814a2370>] init_memory_block+0xd0/0xf0
[ 720.176161] [<ffffffff814a24b1>] register_new_memory+0x91/0xa0
[ 720.176164] [<ffffffff81705de0>] __add_pages+0x140/0x240
[ 720.176167] [<ffffffff81055649>] arch_add_memory+0x59/0xd0
[ 720.176170] [<ffffffff817060b4>] add_memory+0xe4/0x1f0
[ 720.176172] [<ffffffff8142c2d2>] balloon_process+0x382/0x420
[ 720.176175] [<ffffffff810838a2>] process_one_work+0x182/0x450
[ 720.176178] [<ffffffff81084641>] worker_thread+0x121/0x410
[ 720.176180] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
[ 720.176183] [<ffffffff8108b312>] kthread+0xd2/0xf0
[ 720.176185] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[ 720.176188] [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
[ 720.176190] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[ 720.176197] INFO: task test-gnt:957 blocked for more than 120 seconds.
[ 720.176203] Not tainted 3.13.0-24-generic #46
[ 720.176206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 720.176210] test-gnt D 0000000000000000 0 957 954 0x00000001
[ 720.176213] ffff8800053f3d70 0000000000000202 ffff88003a3847d0 ffff8800053f3fd8
[ 720.176215] 0000000000014440 0000000000014440 ffff88003a3847d0 ffffffff81c99740
[ 720.176217] ffffffff81c99744 ffff88003a3847d0 00000000ffffffff ffffffff81c99748
[ 720.176219] Call Trace:
[ 720.176222] [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[ 720.176224] [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[ 720.176226] [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[ 720.176229] [<ffffffff8142bb4e>] free_xenballooned_pages+0x1e/0x90
[ 720.176236] [<ffffffffa000c586>] gntdev_free_map+0x26/0x60 [xen_gntdev]
[ 720.176238] [<ffffffffa000c6f8>] gntdev_put_map+0xa8/0x100 [xen_gntdev]
[ 720.176241] [<ffffffffa000d2e2>] gntdev_ioctl+0x442/0x750 [xen_gntdev]
[ 720.176245] [<ffffffff811cc6e0>] do_vfs_ioctl+0x2e0/0x4c0
[ 720.176248] [<ffffffff81079e42>] ? ptrace_notify+0x82/0xc0
[ 720.176250] [<ffffffff811cc941>] SyS_ioctl+0x81/0xa0
[ 720.176252] [<ffffffff8172663f>] tracesys+0xe1/0xe6
[ 720.176254] INFO: task systemd-udevd:958 blocked for more than 120 seconds.
[ 720.176258] Not tainted 3.13.0-24-generic #46
[ 720.176261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 720.176265] systemd-udevd D 0000000000000000 0 958 269 0x00000004
[ 720.176267] ffff88003bb0dd20 0000000000000202 ffff88003a382fe0 ffff88003bb0dfd8
[ 720.176269] 0000000000014440 0000000000014440 ffff88003a382fe0 ffffffff81c62040
[ 720.176271] ffffffff81c62044 ffff88003a382fe0 00000000ffffffff ffffffff81c62048
[ 720.176273] Call Trace:
[ 720.176276] [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[ 720.176278] [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[ 720.176280] [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[ 720.176282] [<ffffffff81706ab3>] online_pages+0x33/0x570
[ 720.176285] [<ffffffff814a2108>] memory_subsys_online+0x68/0xd0
[ 720.176287] [<ffffffff8148c555>] device_online+0x65/0x90
[ 720.176289] [<ffffffff814a1d94>] store_mem_state+0x64/0x160
[ 720.176291] [<ffffffff81489ab8>] dev_attr_store+0x18/0x30
[ 720.176295] [<ffffffff8122f418>] sysfs_write_file+0x128/0x1c0
[ 720.176297] [<ffffffff811b9534>] vfs_write+0xb4/0x1f0
[ 720.176300] [<ffffffff811b9f69>] SyS_write+0x49/0xa0
[ 720.176302] [<ffffffff8172663f>] tracesys+0xe1/0xe6
Presumably the same problem would affect a userspace block backend using
grant mapping, like qemu disk? Or maybe I’m just doing it wrong (always possible!) :-)
Cheers,
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-gntdev gets stuck unmapping a 2nd page
2014-09-05 21:04 xen-gntdev gets stuck unmapping a 2nd page Dave Scott
@ 2014-09-10 12:41 ` David Vrabel
2014-09-10 12:46 ` Dave Scott
2014-09-12 11:13 ` Dave Scott
0 siblings, 2 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-10 12:41 UTC (permalink / raw)
To: Dave Scott, xen-devel; +Cc: John Else, David Vrabel, Anil Madhavapeddy
On 05/09/14 22:04, Dave Scott wrote:
> Hi,
>
> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
> It works ok if I map/unmap in a single page. If I increase the buffer size
> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
> libxc calls), the process gets stuck in disk sleep when unmapping the second
> page. The same happens if I comment out the unmap and let the program exit.
>
> I'm using a 3.13 kernel:
Can you try the latest kernel? This looks like it may be a bug in the
generic memory hotplug device driver rather than something specific to
gntdev.
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-gntdev gets stuck unmapping a 2nd page
2014-09-10 12:41 ` David Vrabel
@ 2014-09-10 12:46 ` Dave Scott
2014-09-12 11:13 ` Dave Scott
1 sibling, 0 replies; 4+ messages in thread
From: Dave Scott @ 2014-09-10 12:46 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel, John Else, Anil Madhavapeddy
On 10 Sep 2014, at 13:41, David Vrabel <david.vrabel@citrix.com> wrote:
> On 05/09/14 22:04, Dave Scott wrote:
>> Hi,
>>
>> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
>> It works ok if I map/unmap in a single page. If I increase the buffer size
>> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
>> libxc calls), the process gets stuck in disk sleep when unmapping the second
>> page. The same happens if I comment out the unmap and let the program exit.
>>
>> I'm using a 3.13 kernel:
>
> Can you try the latest kernel? This looks like it may be a bug in the
> generic memory hotplug device driver rather than something specific to
> gntdev.
OK, will give it a go.
Thanks,
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xen-gntdev gets stuck unmapping a 2nd page
2014-09-10 12:41 ` David Vrabel
2014-09-10 12:46 ` Dave Scott
@ 2014-09-12 11:13 ` Dave Scott
1 sibling, 0 replies; 4+ messages in thread
From: Dave Scott @ 2014-09-12 11:13 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel, John Else, Anil Madhavapeddy
On 10 Sep 2014, at 13:41, David Vrabel <david.vrabel@citrix.com> wrote:
> On 05/09/14 22:04, Dave Scott wrote:
>> Hi,
>>
>> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
>> It works ok if I map/unmap in a single page. If I increase the buffer size
>> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
>> libxc calls), the process gets stuck in disk sleep when unmapping the second
>> page. The same happens if I comment out the unmap and let the program exit.
>>
>> I'm using a 3.13 kernel:
>
> Can you try the latest kernel? This looks like it may be a bug in the
> generic memory hotplug device driver rather than something specific to
> gntdev.
On a 3.17.0-rc4 (is that new enough?) it fails a bit differently:
$ sudo xenstore-read domid
9
$ sudo ./test-gnt 9
src=0x7fd8e9600000 ref=880 dest=0x7fd8e95fe000
src=0x7fd8e95ff000 ref=879 dest=0x7fd8e95fd000
Unmapped first page
<— previously it blocked here
Unmapped second page
<- now it hangs here (2 out of 2 times so far) and hits a NULL pointer deference:
Sep 12 12:03:47 ubuntu kernel: [ 136.738101] test-gnt[1192]: segfault at 0 ip 00007f4dc7160407 sp 00007fff65bf0260 error 4 in libc-2.19.so[7f4dc7123000+1bb000]
Sep 12 12:05:03 ubuntu kernel: [ 212.021101] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Sep 12 12:05:03 ubuntu kernel: [ 212.021161] IP: [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [ 212.021202] PGD 0
Sep 12 12:05:03 ubuntu kernel: [ 212.021217] Oops: 0000 [#1] SMP
Sep 12 12:05:03 ubuntu kernel: [ 212.021241] Modules linked in: xen_gntalloc xen_gntdev ext2 ppdev microcode serio_raw i2c_piix4 parport_pc joydev mac_hid lp parport hid_generic usbhid hid psmouse floppy
Sep 12 12:05:03 ubuntu kernel: [ 212.021371] CPU: 0 PID: 1225 Comm: sudo Not tainted 3.17.0-rc4 #4
Sep 12 12:05:03 ubuntu kernel: [ 212.021407] Hardware name: Xen HVM domU, BIOS 4.4.0-xs88167-d 09/07/2014
Sep 12 12:05:03 ubuntu kernel: [ 212.021446] task: ffff88002fbd8000 ti: ffff880024a04000 task.ti: ffff880024a04000
Sep 12 12:05:03 ubuntu kernel: [ 212.021489] RIP: 0010:[<ffffffff81177420>] [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [ 212.021541] RSP: 0018:ffff880024a07ce8 EFLAGS: 00010286
Sep 12 12:05:03 ubuntu kernel: [ 212.021572] RAX: ffffffffffffffff RBX: ffffffffffffffff RCX: ffff880024a07c68
Sep 12 12:05:03 ubuntu kernel: [ 212.021613] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880024a07c68
Sep 12 12:05:03 ubuntu kernel: [ 212.021654] RBP: ffff880024a07d00 R08: ffff880025c772d0 R09: 0000000000000000
Sep 12 12:05:03 ubuntu kernel: [ 212.021695] R10: ffffea0000bb1e80 R11: 0000000000000206 R12: ffff880024a07e58
Sep 12 12:05:03 ubuntu kernel: [ 212.021736] R13: ffff880024a07e30 R14: 00007f4867e72000 R15: ffff880024a07e30
Sep 12 12:05:03 ubuntu kernel: [ 212.021782] FS: 00007f486a257840(0000) GS:ffff88003f400000(0000) knlGS:0000000000000000
Sep 12 12:05:03 ubuntu kernel: [ 212.021828] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 12 12:05:03 ubuntu kernel: [ 212.021863] CR2: 0000000000000007 CR3: 000000002f7a2000 CR4: 00000000000006f0
Sep 12 12:05:03 ubuntu kernel: [ 212.021905] Stack:
Sep 12 12:05:03 ubuntu kernel: [ 212.021918] 00007f4868086000 ffffea0000a70ac0 00007f4867e78000 ffff880024a07de0
Sep 12 12:05:03 ubuntu kernel: [ 212.021967] ffffffff81178d0d 00007f4867e77fff 00007f4867e77fff ffff88002f7a27f0
Sep 12 12:05:03 ubuntu kernel: [ 212.022017] 00007f4867e78000 ffff88002ec70908 00007f4867e78000 00007f4867e77fff
Sep 12 12:05:03 ubuntu kernel: [ 212.022066] Call Trace:
Sep 12 12:05:03 ubuntu kernel: [ 212.022084] [<ffffffff81178d0d>] unmap_single_vma+0x50d/0x8a0
Sep 12 12:05:03 ubuntu kernel: [ 212.022119] [<ffffffff81179af9>] unmap_vmas+0x49/0x90
Sep 12 12:05:03 ubuntu kernel: [ 212.022151] [<ffffffff8117ef6d>] unmap_region+0x9d/0x110
Sep 12 12:05:03 ubuntu kernel: [ 212.022183] [<ffffffff8117ecb8>] ? vma_gap_callbacks_rotate+0x18/0x30
Sep 12 12:05:03 ubuntu kernel: [ 212.022222] [<ffffffff81181288>] do_munmap+0x228/0x3b0
Sep 12 12:05:03 ubuntu kernel: [ 212.022253] [<ffffffff81181451>] vm_munmap+0x41/0x60
Sep 12 12:05:03 ubuntu kernel: [ 212.022284] [<ffffffff811823a2>] SyS_munmap+0x22/0x30
Sep 12 12:05:03 ubuntu kernel: [ 212.022316] [<ffffffff816f4ce9>] system_call_fastpath+0x16/0x1b
Sep 12 12:05:03 ubuntu kernel: [ 212.022350] Code: 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 49 89 fd 41 54 4c 8d 67 28 4d 85 e4 53 4c 89 e3 74 1e 0f 1f 00 <8b> 73 08 48 8d 7b 10 e8 74 4e 01 00 c7 43 08 00 00 00 00 48 8b
Sep 12 12:05:03 ubuntu kernel: [ 212.025083] RIP [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [ 212.025083] RSP <ffff880024a07ce8>
Sep 12 12:05:03 ubuntu kernel: [ 212.025083] CR2: 0000000000000007
Sep 12 12:05:03 ubuntu kernel: [ 212.028713] ---[ end trace c15bddd8545d0fb1 ]—
Cheers,
Dave
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-12 11:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05 21:04 xen-gntdev gets stuck unmapping a 2nd page Dave Scott
2014-09-10 12:41 ` David Vrabel
2014-09-10 12:46 ` Dave Scott
2014-09-12 11:13 ` Dave Scott
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.