All of lore.kernel.org
 help / color / mirror / Atom feed
* xen-gntdev gets stuck unmapping a 2nd page
@ 2014-09-05 21:04 Dave Scott
  2014-09-10 12:41 ` David Vrabel
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Scott @ 2014-09-05 21:04 UTC (permalink / raw)
  To: xen-devel; +Cc: John Else, David Vrabel, Anil Madhavapeddy

Hi,

I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
It works ok if I map/unmap in a single page. If I increase the buffer size
and cause vchan to map a second page (non-contiguous mappings via 2 distinct
libxc calls), the process gets stuck in disk sleep when unmapping the second
page. The same happens if I comment out the unmap and let the program exit.

I'm using a 3.13 kernel:

$ uname -a
Linux ubuntu 3.13.0-24-generic #46 SMP Thu Aug 28 23:05:20 BST 2014 x86_64 x86_64 x86_64 GNU/Linux

Although I'm testing vchan between two domUs, I believe this repros it
with loopback grants within a single domU:

$ cat > test-gnt.c <<EOT
#include <xenctrl.h>
#include <stdint.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

void main(int argc, char* argv[]){
    int my_domid = atoi(argv[1]);
    uint32_t refs1, refs2;
    void *share1, *share2, *map1, *map2;
    int count = 1;
    xc_gntshr *xshr = xc_gntshr_open(NULL, 0);
    xc_gnttab *xtab = xc_gnttab_open(NULL, 0);
    if (!xshr || !xtab)
        goto fail;

    share1 = xc_gntshr_share_pages(xshr, my_domid, count, &refs1, 1);
    share2 = xc_gntshr_share_pages(xshr, my_domid, count, &refs2, 1);
    if (!share1 || !share2)
        goto fail;
    map1 = xc_gnttab_map_grant_ref(xtab, my_domid, refs1, PROT_READ);
    map2 = xc_gnttab_map_grant_ref(xtab, my_domid, refs2, PROT_READ);
    fprintf(stderr, "src=%p ref=%"PRIu32" dest=%p\n", share1, refs1, map1);
    fprintf(stderr, "src=%p ref=%"PRIu32" dest=%p\n", share2, refs2, map2);
    xc_gnttab_munmap(xtab, map2, count);
    fprintf(stderr, "Unmapped first page\n"); fflush(stderr);

    /* This call never completes: */
    xc_gnttab_munmap(xtab, map1, count);
    fprintf(stderr, "Unmapped second page\n"); fflush(stderr);
    exit(0);
fail:
    perror(NULL);
    exit(1);
}
EOT
$ gcc -o test-gnt test-gnt.c -lxenctrl
$ sudo ./test-gnt $(sudo xenstore-read domid)
src=0x7f74080bd000 ref=54 dest=0x7f74080bb000
src=0x7f74080bc000 ref=85 dest=0x7f74080ba000
Unmapped first page
<now it's dead>

>From the logs it looks like free_xenballooned_pages blocks forever:

[  720.176089] INFO: task kworker/0:1:27 blocked for more than 120 seconds.
[  720.176101]       Not tainted 3.13.0-24-generic #46
[  720.176105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.176110] kworker/0:1     D 0000000000000000     0    27      2 0x00000000
[  720.176118] Workqueue: events balloon_process
[  720.176120]  ffff88003db85b20 0000000000000202 ffff88003db597f0 ffff88003db85fd8
[  720.176123]  0000000000014440 0000000000014440 ffff88003db597f0 ffff88003c6cfc98
[  720.176125]  ffff88003c6cfc9c ffff88003db597f0 00000000ffffffff ffff88003c6cfca0
[  720.176127] Call Trace:
[  720.176133]  [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[  720.176136]  [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[  720.176139]  [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[  720.176142]  [<ffffffff8148e92d>] device_attach+0x1d/0xa0
[  720.176145]  [<ffffffff8148dda8>] bus_probe_device+0x98/0xc0
[  720.176147]  [<ffffffff8148bc05>] device_add+0x4c5/0x640
[  720.176149]  [<ffffffff8148bd9a>] device_register+0x1a/0x20
[  720.176158]  [<ffffffff814a2370>] init_memory_block+0xd0/0xf0
[  720.176161]  [<ffffffff814a24b1>] register_new_memory+0x91/0xa0
[  720.176164]  [<ffffffff81705de0>] __add_pages+0x140/0x240
[  720.176167]  [<ffffffff81055649>] arch_add_memory+0x59/0xd0
[  720.176170]  [<ffffffff817060b4>] add_memory+0xe4/0x1f0
[  720.176172]  [<ffffffff8142c2d2>] balloon_process+0x382/0x420
[  720.176175]  [<ffffffff810838a2>] process_one_work+0x182/0x450
[  720.176178]  [<ffffffff81084641>] worker_thread+0x121/0x410
[  720.176180]  [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
[  720.176183]  [<ffffffff8108b312>] kthread+0xd2/0xf0
[  720.176185]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[  720.176188]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
[  720.176190]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[  720.176197] INFO: task test-gnt:957 blocked for more than 120 seconds.
[  720.176203]       Not tainted 3.13.0-24-generic #46
[  720.176206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.176210] test-gnt        D 0000000000000000     0   957    954 0x00000001
[  720.176213]  ffff8800053f3d70 0000000000000202 ffff88003a3847d0 ffff8800053f3fd8
[  720.176215]  0000000000014440 0000000000014440 ffff88003a3847d0 ffffffff81c99740
[  720.176217]  ffffffff81c99744 ffff88003a3847d0 00000000ffffffff ffffffff81c99748
[  720.176219] Call Trace:
[  720.176222]  [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[  720.176224]  [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[  720.176226]  [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[  720.176229]  [<ffffffff8142bb4e>] free_xenballooned_pages+0x1e/0x90
[  720.176236]  [<ffffffffa000c586>] gntdev_free_map+0x26/0x60 [xen_gntdev]
[  720.176238]  [<ffffffffa000c6f8>] gntdev_put_map+0xa8/0x100 [xen_gntdev]
[  720.176241]  [<ffffffffa000d2e2>] gntdev_ioctl+0x442/0x750 [xen_gntdev]
[  720.176245]  [<ffffffff811cc6e0>] do_vfs_ioctl+0x2e0/0x4c0
[  720.176248]  [<ffffffff81079e42>] ? ptrace_notify+0x82/0xc0
[  720.176250]  [<ffffffff811cc941>] SyS_ioctl+0x81/0xa0
[  720.176252]  [<ffffffff8172663f>] tracesys+0xe1/0xe6
[  720.176254] INFO: task systemd-udevd:958 blocked for more than 120 seconds.
[  720.176258]       Not tainted 3.13.0-24-generic #46
[  720.176261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.176265] systemd-udevd   D 0000000000000000     0   958    269 0x00000004
[  720.176267]  ffff88003bb0dd20 0000000000000202 ffff88003a382fe0 ffff88003bb0dfd8
[  720.176269]  0000000000014440 0000000000014440 ffff88003a382fe0 ffffffff81c62040
[  720.176271]  ffffffff81c62044 ffff88003a382fe0 00000000ffffffff ffffffff81c62048
[  720.176273] Call Trace:
[  720.176276]  [<ffffffff8171a3a9>] schedule_preempt_disabled+0x29/0x70
[  720.176278]  [<ffffffff8171c215>] __mutex_lock_slowpath+0x135/0x1b0
[  720.176280]  [<ffffffff8171c2af>] mutex_lock+0x1f/0x2f
[  720.176282]  [<ffffffff81706ab3>] online_pages+0x33/0x570
[  720.176285]  [<ffffffff814a2108>] memory_subsys_online+0x68/0xd0
[  720.176287]  [<ffffffff8148c555>] device_online+0x65/0x90
[  720.176289]  [<ffffffff814a1d94>] store_mem_state+0x64/0x160
[  720.176291]  [<ffffffff81489ab8>] dev_attr_store+0x18/0x30
[  720.176295]  [<ffffffff8122f418>] sysfs_write_file+0x128/0x1c0
[  720.176297]  [<ffffffff811b9534>] vfs_write+0xb4/0x1f0
[  720.176300]  [<ffffffff811b9f69>] SyS_write+0x49/0xa0
[  720.176302]  [<ffffffff8172663f>] tracesys+0xe1/0xe6

Presumably the same problem would affect a userspace block backend using
grant mapping, like qemu disk? Or maybe I’m just doing it wrong (always possible!) :-)

Cheers,
Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen-gntdev gets stuck unmapping a 2nd page
  2014-09-05 21:04 xen-gntdev gets stuck unmapping a 2nd page Dave Scott
@ 2014-09-10 12:41 ` David Vrabel
  2014-09-10 12:46   ` Dave Scott
  2014-09-12 11:13   ` Dave Scott
  0 siblings, 2 replies; 4+ messages in thread
From: David Vrabel @ 2014-09-10 12:41 UTC (permalink / raw)
  To: Dave Scott, xen-devel; +Cc: John Else, David Vrabel, Anil Madhavapeddy

On 05/09/14 22:04, Dave Scott wrote:
> Hi,
> 
> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
> It works ok if I map/unmap in a single page. If I increase the buffer size
> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
> libxc calls), the process gets stuck in disk sleep when unmapping the second
> page. The same happens if I comment out the unmap and let the program exit.
> 
> I'm using a 3.13 kernel:

Can you try the latest kernel?  This looks like it may be a bug in the
generic memory hotplug device driver rather than something specific to
gntdev.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen-gntdev gets stuck unmapping a 2nd page
  2014-09-10 12:41 ` David Vrabel
@ 2014-09-10 12:46   ` Dave Scott
  2014-09-12 11:13   ` Dave Scott
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Scott @ 2014-09-10 12:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, John Else, Anil Madhavapeddy


On 10 Sep 2014, at 13:41, David Vrabel <david.vrabel@citrix.com> wrote:

> On 05/09/14 22:04, Dave Scott wrote:
>> Hi,
>> 
>> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
>> It works ok if I map/unmap in a single page. If I increase the buffer size
>> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
>> libxc calls), the process gets stuck in disk sleep when unmapping the second
>> page. The same happens if I comment out the unmap and let the program exit.
>> 
>> I'm using a 3.13 kernel:
> 
> Can you try the latest kernel?  This looks like it may be a bug in the
> generic memory hotplug device driver rather than something specific to
> gntdev.

OK, will give it a go.

Thanks,
Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xen-gntdev gets stuck unmapping a 2nd page
  2014-09-10 12:41 ` David Vrabel
  2014-09-10 12:46   ` Dave Scott
@ 2014-09-12 11:13   ` Dave Scott
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Scott @ 2014-09-12 11:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, John Else, Anil Madhavapeddy


On 10 Sep 2014, at 13:41, David Vrabel <david.vrabel@citrix.com> wrote:

> On 05/09/14 22:04, Dave Scott wrote:
>> Hi,
>> 
>> I've been playing more with vchan and I think I've hit a bug in xen-gntdev.
>> It works ok if I map/unmap in a single page. If I increase the buffer size
>> and cause vchan to map a second page (non-contiguous mappings via 2 distinct
>> libxc calls), the process gets stuck in disk sleep when unmapping the second
>> page. The same happens if I comment out the unmap and let the program exit.
>> 
>> I'm using a 3.13 kernel:
> 
> Can you try the latest kernel?  This looks like it may be a bug in the
> generic memory hotplug device driver rather than something specific to
> gntdev.

On a 3.17.0-rc4 (is that new enough?) it fails a bit differently:

    $ sudo xenstore-read domid
    9
    $ sudo ./test-gnt 9
    src=0x7fd8e9600000 ref=880 dest=0x7fd8e95fe000
    src=0x7fd8e95ff000 ref=879 dest=0x7fd8e95fd000
    Unmapped first page

<— previously it blocked here

    Unmapped second page

<- now it hangs here (2 out of 2 times so far) and hits a NULL pointer deference:

Sep 12 12:03:47 ubuntu kernel: [  136.738101] test-gnt[1192]: segfault at 0 ip 00007f4dc7160407 sp 00007fff65bf0260 error 4 in libc-2.19.so[7f4dc7123000+1bb000]
Sep 12 12:05:03 ubuntu kernel: [  212.021101] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Sep 12 12:05:03 ubuntu kernel: [  212.021161] IP: [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [  212.021202] PGD 0 
Sep 12 12:05:03 ubuntu kernel: [  212.021217] Oops: 0000 [#1] SMP 
Sep 12 12:05:03 ubuntu kernel: [  212.021241] Modules linked in: xen_gntalloc xen_gntdev ext2 ppdev microcode serio_raw i2c_piix4 parport_pc joydev mac_hid lp parport hid_generic usbhid hid psmouse floppy
Sep 12 12:05:03 ubuntu kernel: [  212.021371] CPU: 0 PID: 1225 Comm: sudo Not tainted 3.17.0-rc4 #4
Sep 12 12:05:03 ubuntu kernel: [  212.021407] Hardware name: Xen HVM domU, BIOS 4.4.0-xs88167-d 09/07/2014
Sep 12 12:05:03 ubuntu kernel: [  212.021446] task: ffff88002fbd8000 ti: ffff880024a04000 task.ti: ffff880024a04000
Sep 12 12:05:03 ubuntu kernel: [  212.021489] RIP: 0010:[<ffffffff81177420>]  [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [  212.021541] RSP: 0018:ffff880024a07ce8  EFLAGS: 00010286
Sep 12 12:05:03 ubuntu kernel: [  212.021572] RAX: ffffffffffffffff RBX: ffffffffffffffff RCX: ffff880024a07c68
Sep 12 12:05:03 ubuntu kernel: [  212.021613] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880024a07c68
Sep 12 12:05:03 ubuntu kernel: [  212.021654] RBP: ffff880024a07d00 R08: ffff880025c772d0 R09: 0000000000000000
Sep 12 12:05:03 ubuntu kernel: [  212.021695] R10: ffffea0000bb1e80 R11: 0000000000000206 R12: ffff880024a07e58
Sep 12 12:05:03 ubuntu kernel: [  212.021736] R13: ffff880024a07e30 R14: 00007f4867e72000 R15: ffff880024a07e30
Sep 12 12:05:03 ubuntu kernel: [  212.021782] FS:  00007f486a257840(0000) GS:ffff88003f400000(0000) knlGS:0000000000000000
Sep 12 12:05:03 ubuntu kernel: [  212.021828] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 12 12:05:03 ubuntu kernel: [  212.021863] CR2: 0000000000000007 CR3: 000000002f7a2000 CR4: 00000000000006f0
Sep 12 12:05:03 ubuntu kernel: [  212.021905] Stack:
Sep 12 12:05:03 ubuntu kernel: [  212.021918]  00007f4868086000 ffffea0000a70ac0 00007f4867e78000 ffff880024a07de0
Sep 12 12:05:03 ubuntu kernel: [  212.021967]  ffffffff81178d0d 00007f4867e77fff 00007f4867e77fff ffff88002f7a27f0
Sep 12 12:05:03 ubuntu kernel: [  212.022017]  00007f4867e78000 ffff88002ec70908 00007f4867e78000 00007f4867e77fff
Sep 12 12:05:03 ubuntu kernel: [  212.022066] Call Trace:
Sep 12 12:05:03 ubuntu kernel: [  212.022084]  [<ffffffff81178d0d>] unmap_single_vma+0x50d/0x8a0
Sep 12 12:05:03 ubuntu kernel: [  212.022119]  [<ffffffff81179af9>] unmap_vmas+0x49/0x90
Sep 12 12:05:03 ubuntu kernel: [  212.022151]  [<ffffffff8117ef6d>] unmap_region+0x9d/0x110
Sep 12 12:05:03 ubuntu kernel: [  212.022183]  [<ffffffff8117ecb8>] ? vma_gap_callbacks_rotate+0x18/0x30
Sep 12 12:05:03 ubuntu kernel: [  212.022222]  [<ffffffff81181288>] do_munmap+0x228/0x3b0
Sep 12 12:05:03 ubuntu kernel: [  212.022253]  [<ffffffff81181451>] vm_munmap+0x41/0x60
Sep 12 12:05:03 ubuntu kernel: [  212.022284]  [<ffffffff811823a2>] SyS_munmap+0x22/0x30
Sep 12 12:05:03 ubuntu kernel: [  212.022316]  [<ffffffff816f4ce9>] system_call_fastpath+0x16/0x1b
Sep 12 12:05:03 ubuntu kernel: [  212.022350] Code: 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 49 89 fd 41 54 4c 8d 67 28 4d 85 e4 53 4c 89 e3 74 1e 0f 1f 00 <8b> 73 08 48 8d 7b 10 e8 74 4e 01 00 c7 43 08 00 00 00 00 48 8b 
Sep 12 12:05:03 ubuntu kernel: [  212.025083] RIP  [<ffffffff81177420>] tlb_flush_mmu_free+0x20/0x50
Sep 12 12:05:03 ubuntu kernel: [  212.025083]  RSP <ffff880024a07ce8>
Sep 12 12:05:03 ubuntu kernel: [  212.025083] CR2: 0000000000000007
Sep 12 12:05:03 ubuntu kernel: [  212.028713] ---[ end trace c15bddd8545d0fb1 ]—

Cheers,
Dave

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-12 11:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-05 21:04 xen-gntdev gets stuck unmapping a 2nd page Dave Scott
2014-09-10 12:41 ` David Vrabel
2014-09-10 12:46   ` Dave Scott
2014-09-12 11:13   ` Dave Scott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.