xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)

* xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
@ 2014-08-27 21:33 Dave Scott
  2014-08-28 13:50 ` David Vrabel
  2014-09-02 13:55 ` David Vrabel
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Scott @ 2014-08-27 21:33 UTC (permalink / raw)
  To: xen-devel; +Cc: John Else, Anil Madhavapeddy

Hi,

I've been playing with gntshr (as used by libvchan) and have noticed a
few problems. Firstly if I use xc_gntshr_share_pages to share > 1 page
then it seems to leak after xc_gntshr_munmap:

$ cat > test-gnt.c <<EOT
#include <xenctrl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

void main(int argc, char* argv[]){
    int count = atoi(argv[1]);
    uint32_t *refs = (uint32_t*)malloc(count * sizeof(uint32_t));
    void *map;
    int total = 0;
    xc_gntshr *xgh = xc_gntshr_open(NULL, 0);
    if (!xgh)
        goto fail;
    while(1) {
        map = xc_gntshr_share_pages(xgh, 0, count, refs, 1);
        if (!map)
            goto fail;
        if (xc_gntshr_munmap(xgh, map, count) != 0)
            goto fail;
        total++;
    }
fail:
    perror(NULL);
    fprintf(stderr, "Failed after %d iterations\n", total);
    fflush(stderr);
    exit(1);

}
EOT
$ gcc -o test-gnt test-gnt.c -lxenctrl
$ sudo ./test-gnt 2
xc: error: linux_gntshr_share_pages: ioctl failed (28 = No space left on device): Internal error
No space left on device
Failed after 512 iterations

Running it again gives:

$ sudo ./test-gnt 2
xc: error: linux_gntshr_share_pages: ioctl failed (28 = No space left on device): Internal error
No space left on device
Failed after 256 iterations

... subsequent runs fail earlier and earlier. I added some printf debugging
and noticed that the address returned by xc_gntshr_share_pages was decreasing
by 0x1000 per iteration, suggesting that the xc_gntshr_munmap was unmapping
the first page but missing the second.

I notice xc_gntshr_munmap for Linux simply calls 'munmap'

static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
                               void *start_address, uint32_t count)
{
    return munmap(start_address, count);
}

-- so I guess the problem is with the xen-gntalloc driver?

If I share single pages at a time then it triggers a BUG:
$ sudo ./test-gnt 1
[  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
[  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564312] PGD 3d520067 PUD 0
[  148.564317] Oops: 0000 [#1] SMP
[  148.564322] CPU 0
[  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
[  148.564337]
[  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
[  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
[  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
[  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
[  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
[  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
[  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
[  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
[  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
[  148.564423] Stack:
[  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
[  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
[  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
[  148.564452] Call Trace:
[  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
[  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
[  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
[  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
[  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
[  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
[  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
[  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
[  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
[  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
[  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564583]  RSP <ffff88003c655da0>
[  148.564586] CR2: ffffc908001bff20
[  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---

This is on an Ubuntu trusty (PV) domU running on xen-4.4

Linux ubuntu 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:46:11 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It works for a lot longer if I slow the loop down with a printf() in the middle.

It looks to me like two separate bugs: (i) a leak when unmapping > 1 page; and (ii) a race condition triggered by a tight share/unmap loop.

FWIW I've had no problem with the xen-gntdev driver so far.

Cheers,
Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread