All of lore.kernel.org
 help / color / mirror / Atom feed
* xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
@ 2014-08-27 21:33 Dave Scott
  2014-08-28 13:50 ` David Vrabel
  2014-09-02 13:55 ` David Vrabel
  0 siblings, 2 replies; 9+ messages in thread
From: Dave Scott @ 2014-08-27 21:33 UTC (permalink / raw)
  To: xen-devel; +Cc: John Else, Anil Madhavapeddy

Hi,

I've been playing with gntshr (as used by libvchan) and have noticed a
few problems. Firstly if I use xc_gntshr_share_pages to share > 1 page
then it seems to leak after xc_gntshr_munmap:

$ cat > test-gnt.c <<EOT
#include <xenctrl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

void main(int argc, char* argv[]){
    int count = atoi(argv[1]);
    uint32_t *refs = (uint32_t*)malloc(count * sizeof(uint32_t));
    void *map;
    int total = 0;
    xc_gntshr *xgh = xc_gntshr_open(NULL, 0);
    if (!xgh)
        goto fail;
    while(1) {
        map = xc_gntshr_share_pages(xgh, 0, count, refs, 1);
        if (!map)
            goto fail;
        if (xc_gntshr_munmap(xgh, map, count) != 0)
            goto fail;
        total++;
    }
fail:
    perror(NULL);
    fprintf(stderr, "Failed after %d iterations\n", total);
    fflush(stderr);
    exit(1);

}
EOT
$ gcc -o test-gnt test-gnt.c -lxenctrl
$ sudo ./test-gnt 2
xc: error: linux_gntshr_share_pages: ioctl failed (28 = No space left on device): Internal error
No space left on device
Failed after 512 iterations

Running it again gives:

$ sudo ./test-gnt 2
xc: error: linux_gntshr_share_pages: ioctl failed (28 = No space left on device): Internal error
No space left on device
Failed after 256 iterations

... subsequent runs fail earlier and earlier. I added some printf debugging
and noticed that the address returned by xc_gntshr_share_pages was decreasing
by 0x1000 per iteration, suggesting that the xc_gntshr_munmap was unmapping
the first page but missing the second.

I notice xc_gntshr_munmap for Linux simply calls 'munmap'

static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
                               void *start_address, uint32_t count)
{
    return munmap(start_address, count);
}

-- so I guess the problem is with the xen-gntalloc driver?

If I share single pages at a time then it triggers a BUG:
$ sudo ./test-gnt 1
[  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
[  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564312] PGD 3d520067 PUD 0
[  148.564317] Oops: 0000 [#1] SMP
[  148.564322] CPU 0
[  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
[  148.564337]
[  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
[  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
[  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
[  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
[  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
[  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
[  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
[  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
[  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
[  148.564423] Stack:
[  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
[  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
[  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
[  148.564452] Call Trace:
[  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
[  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
[  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
[  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
[  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
[  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
[  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
[  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
[  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
[  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
[  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[  148.564583]  RSP <ffff88003c655da0>
[  148.564586] CR2: ffffc908001bff20
[  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---

This is on an Ubuntu trusty (PV) domU running on xen-4.4

Linux ubuntu 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:46:11 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It works for a lot longer if I slow the loop down with a printf() in the middle.

It looks to me like two separate bugs: (i) a leak when unmapping > 1 page; and (ii) a race condition triggered by a tight share/unmap loop.

FWIW I've had no problem with the xen-gntdev driver so far.

Cheers,
Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-27 21:33 xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?) Dave Scott
@ 2014-08-28 13:50 ` David Vrabel
  2014-08-29 12:40   ` Dave Scott
  2014-09-02 13:55 ` David Vrabel
  1 sibling, 1 reply; 9+ messages in thread
From: David Vrabel @ 2014-08-28 13:50 UTC (permalink / raw)
  To: Dave Scott, xen-devel; +Cc: John Else, Anil Madhavapeddy

On 27/08/14 22:33, Dave Scott wrote:
> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
> 
> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>                                void *start_address, uint32_t count)
> {
>     return munmap(start_address, count);
> }

munmap() needs a byte length, not a page count.

When using xc_gntshr_munmap() with multiple pages this results in none
of the grefs being deleted (unshared and freed) since a mapping to some
of the grefs in the set remain.

This doesn't appear to explain why they're not deleted by the device is
closed.

> -- so I guess the problem is with the xen-gntalloc driver?
> 
> If I share single pages at a time then it triggers a BUG:
> $ sudo ./test-gnt 1
> [  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
> [  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
> [  148.564312] PGD 3d520067 PUD 0
> [  148.564317] Oops: 0000 [#1] SMP
> [  148.564322] CPU 0
> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
> [  148.564337]
> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
> [  148.564423] Stack:
> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
> [  148.564452] Call Trace:
> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
> [  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
> [  148.564583]  RSP <ffff88003c655da0>
> [  148.564586] CR2: ffffc908001bff20
> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---

Does this patch fix the oops?

8<-------------------------------------
xen/gntalloc: safely delete grefs in add_grefs() undo path

If a gref could not be added (perhaps because the limit has been
reached or there are no more grant references available).  The undo
path may crash because __del_gref() frees the gref while it is being
used for a list iteration.

A comment suggests that using list_for_each_entry() is safe since the
gref isn't removed from the list being iterated over, but it is freed
and thus list_for_each_entry_safe() must be used.

Also, explicitly delete the gref from the per-file list, even though
this is not strictly necessary.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntalloc.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 787d179..b8af1ba 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -124,7 +124,7 @@ static int add_grefs(struct
ioctl_gntalloc_alloc_gref *op,
 	int i, rc, readonly;
 	LIST_HEAD(queue_gref);
 	LIST_HEAD(queue_file);
-	struct gntalloc_gref *gref;
+	struct gntalloc_gref *gref, *next;

 	readonly = !(op->flags & GNTALLOC_FLAG_WRITABLE);
 	rc = -ENOMEM;
@@ -162,9 +162,9 @@ undo:
 	mutex_lock(&gref_mutex);
 	gref_size -= (op->count - i);

-	list_for_each_entry(gref, &queue_file, next_file) {
-		/* __del_gref does not remove from queue_file */
+	list_for_each_entry_safe(gref, next, &queue_file, next_file) {
 		__del_gref(gref);
+		list_del(&gref->next_file);
 	}

 	/* It's possible for the target domain to map the just-allocated grant
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-28 13:50 ` David Vrabel
@ 2014-08-29 12:40   ` Dave Scott
  2014-08-29 14:05     ` David Vrabel
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Scott @ 2014-08-29 12:40 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, John Else, Anil Madhavapeddy

Hi,

On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:

> On 27/08/14 22:33, Dave Scott wrote:
>> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
>> 
>> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>>                               void *start_address, uint32_t count)
>> {
>>    return munmap(start_address, count);
>> }
> 
> munmap() needs a byte length, not a page count.
> 
> When using xc_gntshr_munmap() with multiple pages this results in none
> of the grefs being deleted (unshared and freed) since a mapping to some
> of the grefs in the set remain.

Aha, good spot. I worked around this in my test program by calling xc_gntshr_munmap with (count * 4096) and it’s no longer leaking.

> This doesn't appear to explain why they're not deleted by the device is
> closed.

After rebuilding xen-gntalloc with your change I couldn’t reproduce this. I tried all combinations of

  * map 1 or 2 pages
  * unmap 0, 1 or 2 pages

and, although some of the iterations did run out of grant references (as expected), nothing seemed to be leaked over a close.


> 
>> -- so I guess the problem is with the xen-gntalloc driver?
>> 
>> If I share single pages at a time then it triggers a BUG:
>> $ sudo ./test-gnt 1
>> [  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
>> [  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>> [  148.564312] PGD 3d520067 PUD 0
>> [  148.564317] Oops: 0000 [#1] SMP
>> [  148.564322] CPU 0
>> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
>> [  148.564337]
>> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
>> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
>> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
>> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
>> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
>> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
>> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
>> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
>> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
>> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
>> [  148.564423] Stack:
>> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
>> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
>> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
>> [  148.564452] Call Trace:
>> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
>> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
>> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
>> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
>> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
>> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
>> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
>> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
>> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
>> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
>> [  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>> [  148.564583]  RSP <ffff88003c655da0>
>> [  148.564586] CR2: ffffc908001bff20
>> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---
> 
> Does this patch fix the oops?

Yes, I’ve left my test case running for several hours with no sign of trouble.

Thanks!
Dave

> 
> 8<-------------------------------------
> xen/gntalloc: safely delete grefs in add_grefs() undo path
> 
> If a gref could not be added (perhaps because the limit has been
> reached or there are no more grant references available).  The undo
> path may crash because __del_gref() frees the gref while it is being
> used for a list iteration.
> 
> A comment suggests that using list_for_each_entry() is safe since the
> gref isn't removed from the list being iterated over, but it is freed
> and thus list_for_each_entry_safe() must be used.
> 
> Also, explicitly delete the gref from the per-file list, even though
> this is not strictly necessary.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> drivers/xen/gntalloc.c |    6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
> index 787d179..b8af1ba 100644
> --- a/drivers/xen/gntalloc.c
> +++ b/drivers/xen/gntalloc.c
> @@ -124,7 +124,7 @@ static int add_grefs(struct
> ioctl_gntalloc_alloc_gref *op,
> 	int i, rc, readonly;
> 	LIST_HEAD(queue_gref);
> 	LIST_HEAD(queue_file);
> -	struct gntalloc_gref *gref;
> +	struct gntalloc_gref *gref, *next;
> 
> 	readonly = !(op->flags & GNTALLOC_FLAG_WRITABLE);
> 	rc = -ENOMEM;
> @@ -162,9 +162,9 @@ undo:
> 	mutex_lock(&gref_mutex);
> 	gref_size -= (op->count - i);
> 
> -	list_for_each_entry(gref, &queue_file, next_file) {
> -		/* __del_gref does not remove from queue_file */
> +	list_for_each_entry_safe(gref, next, &queue_file, next_file) {
> 		__del_gref(gref);
> +		list_del(&gref->next_file);
> 	}
> 
> 	/* It's possible for the target domain to map the just-allocated grant
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-29 12:40   ` Dave Scott
@ 2014-08-29 14:05     ` David Vrabel
  2014-08-29 16:46       ` Dave Scott
  2014-08-29 21:15       ` Dave Scott
  0 siblings, 2 replies; 9+ messages in thread
From: David Vrabel @ 2014-08-29 14:05 UTC (permalink / raw)
  To: Dave Scott, David Vrabel
  Cc: xen-devel, Boris Ostrovsky, John Else, Anil Madhavapeddy

On 29/08/14 13:40, Dave Scott wrote:
> Hi,
> 
> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:
> 
>> On 27/08/14 22:33, Dave Scott wrote:
>>> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
>>>
>>> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>>>                               void *start_address, uint32_t count)
>>> {
>>>    return munmap(start_address, count);
>>> }
>>
>> munmap() needs a byte length, not a page count.
>>
>> When using xc_gntshr_munmap() with multiple pages this results in none
>> of the grefs being deleted (unshared and freed) since a mapping to some
>> of the grefs in the set remain.
> 
> Aha, good spot. I worked around this in my test program by calling xc_gntshr_munmap with (count * 4096) and it’s no longer leaking.
> 
>> This doesn't appear to explain why they're not deleted by the device is
>> closed.
> 
> After rebuilding xen-gntalloc with your change I couldn’t reproduce this. I tried all combinations of
> 
>   * map 1 or 2 pages
>   * unmap 0, 1 or 2 pages
> 
> and, although some of the iterations did run out of grant references (as expected), nothing seemed to be leaked over a close.

Ok,  Are you going to submit a libxc patch fixing the munmap() length?

>>> -- so I guess the problem is with the xen-gntalloc driver?
>>>
>>> If I share single pages at a time then it triggers a BUG:
>>> $ sudo ./test-gnt 1
>>> [  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
>>> [  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>> [  148.564312] PGD 3d520067 PUD 0
>>> [  148.564317] Oops: 0000 [#1] SMP
>>> [  148.564322] CPU 0
>>> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
>>> [  148.564337]
>>> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
>>> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
>>> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
>>> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
>>> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
>>> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
>>> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
>>> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
>>> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
>>> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
>>> [  148.564423] Stack:
>>> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
>>> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
>>> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
>>> [  148.564452] Call Trace:
>>> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
>>> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
>>> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
>>> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
>>> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
>>> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
>>> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
>>> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
>>> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
>>> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
>>> [  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>> [  148.564583]  RSP <ffff88003c655da0>
>>> [  148.564586] CR2: ffffc908001bff20
>>> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---
>>
>> Does this patch fix the oops?
> 
> Yes, I’ve left my test case running for several hours with no sign of trouble.

Thanks. I'll take that as a

Tested-by: Dave Scott <david.scott@citrix.com>

Konrad, Boris, can you review please?

David

>> 8<-------------------------------------
>> xen/gntalloc: safely delete grefs in add_grefs() undo path
>>
>> If a gref could not be added (perhaps because the limit has been
>> reached or there are no more grant references available).  The undo
>> path may crash because __del_gref() frees the gref while it is being
>> used for a list iteration.
>>
>> A comment suggests that using list_for_each_entry() is safe since the
>> gref isn't removed from the list being iterated over, but it is freed
>> and thus list_for_each_entry_safe() must be used.
>>
>> Also, explicitly delete the gref from the per-file list, even though
>> this is not strictly necessary.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>> drivers/xen/gntalloc.c |    6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
>> index 787d179..b8af1ba 100644
>> --- a/drivers/xen/gntalloc.c
>> +++ b/drivers/xen/gntalloc.c
>> @@ -124,7 +124,7 @@ static int add_grefs(struct
>> ioctl_gntalloc_alloc_gref *op,
>> 	int i, rc, readonly;
>> 	LIST_HEAD(queue_gref);
>> 	LIST_HEAD(queue_file);
>> -	struct gntalloc_gref *gref;
>> +	struct gntalloc_gref *gref, *next;
>>
>> 	readonly = !(op->flags & GNTALLOC_FLAG_WRITABLE);
>> 	rc = -ENOMEM;
>> @@ -162,9 +162,9 @@ undo:
>> 	mutex_lock(&gref_mutex);
>> 	gref_size -= (op->count - i);
>>
>> -	list_for_each_entry(gref, &queue_file, next_file) {
>> -		/* __del_gref does not remove from queue_file */
>> +	list_for_each_entry_safe(gref, next, &queue_file, next_file) {
>> 		__del_gref(gref);
>> +		list_del(&gref->next_file);
>> 	}
>>
>> 	/* It's possible for the target domain to map the just-allocated grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-29 14:05     ` David Vrabel
@ 2014-08-29 16:46       ` Dave Scott
  2014-08-29 21:15       ` Dave Scott
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Scott @ 2014-08-29 16:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Boris Ostrovsky, John Else, Anil Madhavapeddy


On 29 Aug 2014, at 15:05, David Vrabel <david.vrabel@citrix.com> wrote:

> On 29/08/14 13:40, Dave Scott wrote:
>> Hi,
>> 
>> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:
>> 
>>> On 27/08/14 22:33, Dave Scott wrote:
>>>> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
>>>> 
>>>> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>>>>                              void *start_address, uint32_t count)
>>>> {
>>>>   return munmap(start_address, count);
>>>> }
>>> 
>>> munmap() needs a byte length, not a page count.
>>> 
>>> When using xc_gntshr_munmap() with multiple pages this results in none
>>> of the grefs being deleted (unshared and freed) since a mapping to some
>>> of the grefs in the set remain.
>> 
>> Aha, good spot. I worked around this in my test program by calling xc_gntshr_munmap with (count * 4096) and it’s no longer leaking.
>> 
>>> This doesn't appear to explain why they're not deleted by the device is
>>> closed.
>> 
>> After rebuilding xen-gntalloc with your change I couldn’t reproduce this. I tried all combinations of
>> 
>>  * map 1 or 2 pages
>>  * unmap 0, 1 or 2 pages
>> 
>> and, although some of the iterations did run out of grant references (as expected), nothing seemed to be leaked over a close.
> 
> Ok,  Are you going to submit a libxc patch fixing the munmap() length?

Ok, will do.

Cheers,
Dave

> 
>>>> -- so I guess the problem is with the xen-gntalloc driver?
>>>> 
>>>> If I share single pages at a time then it triggers a BUG:
>>>> $ sudo ./test-gnt 1
>>>> [  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
>>>> [  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564312] PGD 3d520067 PUD 0
>>>> [  148.564317] Oops: 0000 [#1] SMP
>>>> [  148.564322] CPU 0
>>>> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
>>>> [  148.564337]
>>>> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
>>>> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
>>>> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
>>>> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
>>>> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
>>>> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
>>>> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
>>>> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
>>>> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
>>>> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
>>>> [  148.564423] Stack:
>>>> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
>>>> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
>>>> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
>>>> [  148.564452] Call Trace:
>>>> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
>>>> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
>>>> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
>>>> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
>>>> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
>>>> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
>>>> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
>>>> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
>>>> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
>>>> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
>>>> [  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564583]  RSP <ffff88003c655da0>
>>>> [  148.564586] CR2: ffffc908001bff20
>>>> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---
>>> 
>>> Does this patch fix the oops?
>> 
>> Yes, I’ve left my test case running for several hours with no sign of trouble.
> 
> Thanks. I'll take that as a
> 
> Tested-by: Dave Scott <david.scott@citrix.com>
> 
> Konrad, Boris, can you review please?
> 
> David
> 
>>> 8<-------------------------------------
>>> xen/gntalloc: safely delete grefs in add_grefs() undo path
>>> 
>>> If a gref could not be added (perhaps because the limit has been
>>> reached or there are no more grant references available).  The undo
>>> path may crash because __del_gref() frees the gref while it is being
>>> used for a list iteration.
>>> 
>>> A comment suggests that using list_for_each_entry() is safe since the
>>> gref isn't removed from the list being iterated over, but it is freed
>>> and thus list_for_each_entry_safe() must be used.
>>> 
>>> Also, explicitly delete the gref from the per-file list, even though
>>> this is not strictly necessary.
>>> 
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>> drivers/xen/gntalloc.c |    6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
>>> index 787d179..b8af1ba 100644
>>> --- a/drivers/xen/gntalloc.c
>>> +++ b/drivers/xen/gntalloc.c
>>> @@ -124,7 +124,7 @@ static int add_grefs(struct
>>> ioctl_gntalloc_alloc_gref *op,
>>> 	int i, rc, readonly;
>>> 	LIST_HEAD(queue_gref);
>>> 	LIST_HEAD(queue_file);
>>> -	struct gntalloc_gref *gref;
>>> +	struct gntalloc_gref *gref, *next;
>>> 
>>> 	readonly = !(op->flags & GNTALLOC_FLAG_WRITABLE);
>>> 	rc = -ENOMEM;
>>> @@ -162,9 +162,9 @@ undo:
>>> 	mutex_lock(&gref_mutex);
>>> 	gref_size -= (op->count - i);
>>> 
>>> -	list_for_each_entry(gref, &queue_file, next_file) {
>>> -		/* __del_gref does not remove from queue_file */
>>> +	list_for_each_entry_safe(gref, next, &queue_file, next_file) {
>>> 		__del_gref(gref);
>>> +		list_del(&gref->next_file);
>>> 	}
>>> 
>>> 	/* It's possible for the target domain to map the just-allocated grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-29 14:05     ` David Vrabel
  2014-08-29 16:46       ` Dave Scott
@ 2014-08-29 21:15       ` Dave Scott
  2014-09-01 10:19         ` David Vrabel
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Scott @ 2014-08-29 21:15 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Boris Ostrovsky, John Else, Anil Madhavapeddy

Hi,

On 29 Aug 2014, at 15:05, David Vrabel <david.vrabel@citrix.com> wrote:

> On 29/08/14 13:40, Dave Scott wrote:
>> Hi,
>> 
>> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:
>> 
>>> On 27/08/14 22:33, Dave Scott wrote:
>>>> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
>>>> 
>>>> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>>>>                              void *start_address, uint32_t count)
>>>> {
>>>>   return munmap(start_address, count);
>>>> }
>>> 
>>> munmap() needs a byte length, not a page count.
>>> 
>>> When using xc_gntshr_munmap() with multiple pages this results in none
>>> of the grefs being deleted (unshared and freed) since a mapping to some
>>> of the grefs in the set remain.
>> 
>> Aha, good spot. I worked around this in my test program by calling xc_gntshr_munmap with (count * 4096) and it’s no longer leaking.
>> 
>>> This doesn't appear to explain why they're not deleted by the device is
>>> closed.
>> 
>> After rebuilding xen-gntalloc with your change I couldn’t reproduce this. I tried all combinations of
>> 
>>  * map 1 or 2 pages
>>  * unmap 0, 1 or 2 pages
>> 
>> and, although some of the iterations did run out of grant references (as expected), nothing seemed to be leaked over a close.
> 
> Ok,  Are you going to submit a libxc patch fixing the munmap() length?
> 
>>>> -- so I guess the problem is with the xen-gntalloc driver?
>>>> 
>>>> If I share single pages at a time then it triggers a BUG:
>>>> $ sudo ./test-gnt 1
>>>> [  148.564281] BUG: unable to handle kernel paging request at ffffc908001bff20
>>>> [  148.564299] IP: [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564312] PGD 3d520067 PUD 0
>>>> [  148.564317] Oops: 0000 [#1] SMP
>>>> [  148.564322] CPU 0
>>>> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev lp parport
>>>> [  148.564337]
>>>> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
>>>> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
>>>> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 0000000000000000
>>>> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 00000000ffffffe4
>>>> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 000000000003725f
>>>> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: ffff88003cdb9e40
>>>> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 0000000000000000
>>>> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
>>>> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 0000000000000660
>>>> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, task ffff88003cdd4500)
>>>> [  148.564423] Stack:
>>>> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 00000000ffffffe4
>>>> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 ffff88003c655e68
>>>> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 ffff8800041aeba8
>>>> [  148.564452] Call Trace:
>>>> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
>>>> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
>>>> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
>>>> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
>>>> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
>>>> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
>>>> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
>>>> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
>>>> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
>>>> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
>>>> [  148.564577] RIP  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564583]  RSP <ffff88003c655da0>
>>>> [  148.564586] CR2: ffffc908001bff20
>>>> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---
>>> 
>>> Does this patch fix the oops?
>> 
>> Yes, I’ve left my test case running for several hours with no sign of trouble.
> 
> Thanks. I'll take that as a
> 
> Tested-by: Dave Scott <david.scott@citrix.com>

I just realised that my tests weren’t as clear-cut as I hoped. When I rebuilt my ubuntu kernel I accidentally got a free upgrade from 3.2 to 3.13, so what I actually tested was:

3.2: kernel oops triggered easily
3.13 plus your patch: seems solid

I could go back and test 3.13 minus your patch if that would be helpful.

Cheers,
Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-29 21:15       ` Dave Scott
@ 2014-09-01 10:19         ` David Vrabel
  2014-09-02  9:13           ` Dave Scott
  0 siblings, 1 reply; 9+ messages in thread
From: David Vrabel @ 2014-09-01 10:19 UTC (permalink / raw)
  To: Dave Scott, David Vrabel
  Cc: xen-devel, Boris Ostrovsky, John Else, Anil Madhavapeddy

On 29/08/14 22:15, Dave Scott wrote:
> Hi,
> 
> On 29 Aug 2014, at 15:05, David Vrabel <david.vrabel@citrix.com> wrote:
> 
>> On 29/08/14 13:40, Dave Scott wrote:
>>> Hi,
>>>
>>> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:
>>>
>>>> Does this patch fix the oops?
>>>
>>> Yes, I’ve left my test case running for several hours with no sign of trouble.
>>
>> Thanks. I'll take that as a
>>
>> Tested-by: Dave Scott <david.scott@citrix.com>
> 
> I just realised that my tests weren’t as clear-cut as I hoped. When
> I rebuilt my ubuntu kernel I accidentally got a free upgrade from 3.2 to
> 3.13, so what I actually tested was:
> 
> 3.2: kernel oops triggered easily
> 3.13 plus your patch: seems solid
> 
> I could go back and test 3.13 minus your patch if that would be helpful.

Please. I can't seem to easily trigger the race on my test system.  How
long does it take to fail typically in your case?

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-09-01 10:19         ` David Vrabel
@ 2014-09-02  9:13           ` Dave Scott
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Scott @ 2014-09-02  9:13 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Boris Ostrovsky, John Else, Anil Madhavapeddy


On 1 Sep 2014, at 11:19, David Vrabel <david.vrabel@citrix.com> wrote:

> On 29/08/14 22:15, Dave Scott wrote:
>> Hi,
>> 
>> On 29 Aug 2014, at 15:05, David Vrabel <david.vrabel@citrix.com> wrote:
>> 
>>> On 29/08/14 13:40, Dave Scott wrote:
>>>> Hi,
>>>> 
>>>> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@citrix.com> wrote:
>>>> 
>>>>> Does this patch fix the oops?
>>>> 
>>>> Yes, I’ve left my test case running for several hours with no sign of trouble.
>>> 
>>> Thanks. I'll take that as a
>>> 
>>> Tested-by: Dave Scott <david.scott@citrix.com>
>> 
>> I just realised that my tests weren’t as clear-cut as I hoped. When
>> I rebuilt my ubuntu kernel I accidentally got a free upgrade from 3.2 to
>> 3.13, so what I actually tested was:
>> 
>> 3.2: kernel oops triggered easily
>> 3.13 plus your patch: seems solid
>> 
>> I could go back and test 3.13 minus your patch if that would be helpful.
> 
> Please. I can't seem to easily trigger the race on my test system.  How
> long does it take to fail typically in your case?

I tested it on 3.13 minus your patch and it works fine.

On 3.2 it manifested pretty-much instantly. I can’t make it fail on 3.13 at all.

Sorry if I’ve wasted your time!

Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)
  2014-08-27 21:33 xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?) Dave Scott
  2014-08-28 13:50 ` David Vrabel
@ 2014-09-02 13:55 ` David Vrabel
  1 sibling, 0 replies; 9+ messages in thread
From: David Vrabel @ 2014-09-02 13:55 UTC (permalink / raw)
  To: Dave Scott, xen-devel; +Cc: John Else, Anil Madhavapeddy

On 27/08/14 22:33, Dave Scott wrote:
> Hi,
> 
> I've been playing with gntshr (as used by libvchan) and have noticed a
> few problems. Firstly if I use xc_gntshr_share_pages to share > 1 page
> then it seems to leak after xc_gntshr_munmap:
[...]
> ... subsequent runs fail earlier and earlier. I added some printf debugging
> and noticed that the address returned by xc_gntshr_share_pages was decreasing
> by 0x1000 per iteration, suggesting that the xc_gntshr_munmap was unmapping
> the first page but missing the second.

FYI, there's a kernel fix 243082e0d59f (xen/gntalloc: fix reference
counts on multi-page mappings) for multi-page mappings which is included
in 3.3 and later.

> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic #101-Ubuntu
> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] gnttab_query_foreign_access+0x13/0x20
[...]
> [  148.564452] Call Trace:
> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 [xen_gntalloc]
> [  148.564465]  [<ffffffff813adbdb>] ? gnttab_grant_foreign_access+0x2b/0x80
> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 [xen_gntalloc]
> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 [xen_gntalloc]
> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b

This happens if the allocate ioctl fails because there are no more grant
references.  This happened in your case because of the bug in 3.2 (see
above). But it is triggerable in 3.17-rc3 if you increase the gntalloc
limit (via the module parameter) or if you plug in a lots of VIFs/VBDs etc.

I'll send patches to the list shortly.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-02 14:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-27 21:33 xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?) Dave Scott
2014-08-28 13:50 ` David Vrabel
2014-08-29 12:40   ` Dave Scott
2014-08-29 14:05     ` David Vrabel
2014-08-29 16:46       ` Dave Scott
2014-08-29 21:15       ` Dave Scott
2014-09-01 10:19         ` David Vrabel
2014-09-02  9:13           ` Dave Scott
2014-09-02 13:55 ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.