All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
@ 2011-03-30 10:28 Andreas Kinzler
  2011-04-29 19:39 ` Nathan March
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Kinzler @ 2011-03-30 10:28 UTC (permalink / raw)
  To: xen-devel

Hello,

in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes. 
After running some VMs and terminating them, only dom0 running, "ps aux" 
shows:

root      4239  3.9  0.3  21648  3248 SLs  Mar29  48:56 tapdisk2
root      4243 33.1  0.3  21648  3248 SLs  Mar29 416:17 tapdisk2
root      4616  4.0  0.3  21648  3248 SLs  Mar29  51:07 tapdisk2
root      4621 28.8  0.3  21648  3248 SLs  Mar29 360:45 tapdisk2
root      8924  3.2  0.3  21648  3248 SLs  Mar29  27:09 tapdisk2

Regards Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-03-30 10:28 Bug in Xen 4.1.0: Xen leaks tapdisk2 processes Andreas Kinzler
@ 2011-04-29 19:39 ` Nathan March
  2011-05-03 15:56   ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan March @ 2011-04-29 19:39 UTC (permalink / raw)
  To: xen-devel


> in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes.
> After running some VMs and terminating them, only dom0 running, "ps
> aux" shows:

I can definitely confirm this, on a machine with 2 active VMs:

xen1 ~ # ps auxf | grep -c tapdisk2
115

Although obviously many have been started/stopped/migrated away.
Stracing a few random ones shows them all waiting on fd8:

xen1 ~ # strace -p 11580
Process 11580 attached - interrupt to quit
select(8, [3 4 7], [], [], {425, 231813}^C <unfinished ...>
Process 11580 detached
xen1 ~ # strace -p 19110
Process 19110 attached - interrupt to quit
select(8, [3 4 7], [], [], {421, 228456}^C <unfinished ...>
Process 19110 detached
xen1 ~ # strace -p 20738
Process 20738 attached - interrupt to quit
select(8, [3 4 7], [], [], {415, 152712}^C <unfinished ...>
Process 20738 detached
xen1 ~ #

lsof:

tapdisk2 20738 root    3u  unix 0xffff880018585340        0t0 10490566
/var/run/blktap-control/ctl20738
tapdisk2 20738 root    4u  0000                0,8          0     1000
anon_inode
tapdisk2 20738 root    5u  unix 0xffff8800185858c0        0t0 10490568
socket
tapdisk2 20738 root    7u   CHR              251,3        0t0 10490549
/dev/xen/blktap-2/blktap3
tapdisk2 20738 root    8u   REG               0,22 2147483648 32367811
/mnt/xenDisks1/nathanxen2/swap (10.1.5.1:/vol/xenDisks1)

Let me know if I can grab any useful information for debugging.

- Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-04-29 19:39 ` Nathan March
@ 2011-05-03 15:56   ` Ian Campbell
  2011-05-03 17:18     ` Nathan March
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2011-05-03 15:56 UTC (permalink / raw)
  To: Nathan March, Andreas Kinzler; +Cc: xen-devel

(add Andreas back, please don't drop cc's on xen-devel)

On Fri, 2011-04-29 at 20:39 +0100, Nathan March wrote:
> > in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes.
> > After running some VMs and terminating them, only dom0 running, "ps
> > aux" shows:
> 
> I can definitely confirm this, on a machine with 2 active VMs:

I'm assuming this is xl specific and that xend isn't behaving this way
too, is that correct?

The following fixes it for xl, although I'm not exactly thrilled with
the solution.

Calling libxl__device_destroy_tapdisk from the generic
libxl__device_destroy is a bit skanky but is needed so that
libxl__devices_destroy (which just iterates over all device entries in
xenstore) also does the right thing.

The change to the tapctrl library API was necessary to expose the "id"
parameter necessary for tap_ctl_destroy.

I also seem to be seeing an inexplicable delay at the point at which the
tapdisk2 process is shutdown. I'm not sure where to look for clues as to
what happens there.

I know Ian J is looking at improving disk handling for 4.2 but I was
hoping for something somewhat backportable to 4.1 at this stage.

Does it work for you guys?

8<----------------------------

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1304438002 -3600
# Node ID 4d0e906543dc115b5b2f90e0cb44f6affb3f1f99
# Parent  66f006f9ed34e58c10a56026bde41c116e680c8b
libxl: attempt to cleanup tapdisk processes on disk backend destroy.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 66f006f9ed34 -r 4d0e906543dc tools/blktap2/control/tap-ctl-list.c
--- a/tools/blktap2/control/tap-ctl-list.c	Tue May 03 15:03:29 2011 +0100
+++ b/tools/blktap2/control/tap-ctl-list.c	Tue May 03 16:53:22 2011 +0100
@@ -506,17 +506,15 @@ out:
 }
 
 int
-tap_ctl_find_minor(const char *type, const char *path)
+tap_ctl_find(const char *type, const char *path, tap_list_t *tap)
 {
 	tap_list_t **list, **_entry;
-	int minor, err;
+	int ret = -ENOENT, err;
 
 	err = tap_ctl_list(&list);
 	if (err)
 		return err;
 
-	minor = -1;
-
 	for (_entry = list; *_entry != NULL; ++_entry) {
 		tap_list_t *entry  = *_entry;
 
@@ -526,11 +524,13 @@ tap_ctl_find_minor(const char *type, con
 		if (path && (!entry->path || strcmp(entry->path, path)))
 			continue;
 
-		minor = entry->minor;
+		*tap = *entry;
+		tap->type = tap->path = NULL;
+		ret = 0;
 		break;
 	}
 
 	tap_ctl_free_list(list);
 
-	return minor >= 0 ? minor : -ENOENT;
+	return ret;
 }
diff -r 66f006f9ed34 -r 4d0e906543dc tools/blktap2/control/tap-ctl.h
--- a/tools/blktap2/control/tap-ctl.h	Tue May 03 15:03:29 2011 +0100
+++ b/tools/blktap2/control/tap-ctl.h	Tue May 03 16:53:22 2011 +0100
@@ -76,7 +76,7 @@ int tap_ctl_get_driver_id(const char *ha
 
 int tap_ctl_list(tap_list_t ***list);
 void tap_ctl_free_list(tap_list_t **list);
-int tap_ctl_find_minor(const char *type, const char *path);
+int tap_ctl_find(const char *type, const char *path, tap_list_t *tap);
 
 int tap_ctl_allocate(int *minor, char **devname);
 int tap_ctl_free(const int minor);
diff -r 66f006f9ed34 -r 4d0e906543dc tools/libxl/libxl_blktap2.c
--- a/tools/libxl/libxl_blktap2.c	Tue May 03 15:03:29 2011 +0100
+++ b/tools/libxl/libxl_blktap2.c	Tue May 03 16:53:22 2011 +0100
@@ -18,6 +18,8 @@
 
 #include "tap-ctl.h"
 
+#include <string.h>
+
 int libxl__blktap_enabled(libxl__gc *gc)
 {
     const char *msg;
@@ -30,12 +32,13 @@ char *libxl__blktap_devpath(libxl__gc *g
 {
     const char *type;
     char *params, *devname = NULL;
-    int minor, err;
+    tap_list_t tap;
+    int err;
 
     type = libxl__device_disk_string_of_format(format);
-    minor = tap_ctl_find_minor(type, disk);
-    if (minor >= 0) {
-        devname = libxl__sprintf(gc, "/dev/xen/blktap-2/tapdev%d", minor);
+    err = tap_ctl_find(type, disk, &tap);
+    if (err == 0) {
+        devname = libxl__sprintf(gc, "/dev/xen/blktap-2/tapdev%d", tap.minor);
         if (devname)
             return devname;
     }
@@ -49,3 +52,28 @@ char *libxl__blktap_devpath(libxl__gc *g
 
     return NULL;
 }
+
+
+void libxl__device_destroy_tapdisk(libxl__gc *gc, char *be_path)
+{
+    char *path, *params, *type, *disk;
+    int err;
+    tap_list_t tap;
+
+    path = libxl__sprintf(gc, "%s/tapdisk-params", be_path);
+    if (!path) return;
+
+    params = libxl__xs_read(gc, XBT_NULL, path);
+    if (!params) return;
+
+    type = params;
+    disk = strchr(params, ':');
+    if (!disk) return;
+
+    *disk++ = '\0';
+
+    err = tap_ctl_find(type, disk, &tap);
+    if (err < 0) return;
+
+    tap_ctl_destroy(tap.id, tap.minor);
+}
diff -r 66f006f9ed34 -r 4d0e906543dc tools/libxl/libxl_device.c
--- a/tools/libxl/libxl_device.c	Tue May 03 15:03:29 2011 +0100
+++ b/tools/libxl/libxl_device.c	Tue May 03 16:53:22 2011 +0100
@@ -253,6 +253,7 @@ int libxl__device_destroy(libxl__gc *gc,
     if (!state)
         goto out;
     if (atoi(state) != 4) {
+        libxl__device_destroy_tapdisk(gc, be_path);
         xs_rm(ctx->xsh, XBT_NULL, be_path);
         goto out;
     }
@@ -273,6 +274,7 @@ retry_transaction:
         xs_watch(ctx->xsh, state_path, be_path);
         rc = 1;
     }
+    libxl__device_destroy_tapdisk(gc, be_path);
 out:
     return rc;
 }
diff -r 66f006f9ed34 -r 4d0e906543dc tools/libxl/libxl_internal.h
--- a/tools/libxl/libxl_internal.h	Tue May 03 15:03:29 2011 +0100
+++ b/tools/libxl/libxl_internal.h	Tue May 03 16:53:22 2011 +0100
@@ -346,6 +346,12 @@ _hidden char *libxl__blktap_devpath(libx
                                     const char *disk,
                                     libxl_disk_format format);
 
+/* libxl__device_destroy_tapdisk:
+ *   Destroys any tapdisk process associated with the backend represented
+ *   by be_path.
+ */
+_hidden void libxl__device_destroy_tapdisk(libxl__gc *gc, char *be_path);
+
 _hidden char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid);
 
 struct libxl__xen_console_reader {
diff -r 66f006f9ed34 -r 4d0e906543dc tools/libxl/libxl_noblktap2.c
--- a/tools/libxl/libxl_noblktap2.c	Tue May 03 15:03:29 2011 +0100
+++ b/tools/libxl/libxl_noblktap2.c	Tue May 03 16:53:22 2011 +0100
@@ -27,3 +27,7 @@ char *libxl__blktap_devpath(libxl__gc *g
 {
     return NULL;
 }
+
+void libxl__device_destroy_tapdisk(libxl__gc *gc, char *be_path)
+{
+}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-03 15:56   ` Ian Campbell
@ 2011-05-03 17:18     ` Nathan March
  2011-05-05  9:45       ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan March @ 2011-05-03 17:18 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Andreas Kinzler


On 5/3/2011 8:56 AM, Ian Campbell wrote:
> On Fri, 2011-04-29 at 20:39 +0100, Nathan March wrote:
>>> > > in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes.
>>> > > After running some VMs and terminating them, only dom0 running, "ps
>>> > > aux" shows:
>> > 
>> > I can definitely confirm this, on a machine with 2 active VMs:
> I'm assuming this is xl specific and that xend isn't behaving this way
> too, is that correct?

That's a good question, I'm not really sure. I'm doing all the vm
creation/migration/shutdown using libvirt 0.9.0 rc1 and just connecting
with xen:/// to the local xend.  I thought xl was just a userspace
replacement for xm and the backend functionality was the same?

I don't have a spare server to test the patch with at the moment, but I
can try this out later this week.

- Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-03 17:18     ` Nathan March
@ 2011-05-05  9:45       ` Ian Campbell
  2011-05-06 18:27         ` Jim Fehlig
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2011-05-05  9:45 UTC (permalink / raw)
  To: Nathan March; +Cc: Jim Fehlig, xen-devel, Andreas Kinzler, Daniel Stodden

On Tue, 2011-05-03 at 18:18 +0100, Nathan March wrote:
> On 5/3/2011 8:56 AM, Ian Campbell wrote:
> > On Fri, 2011-04-29 at 20:39 +0100, Nathan March wrote:
> >>> > > in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes.
> >>> > > After running some VMs and terminating them, only dom0 running, "ps
> >>> > > aux" shows:
> >> > 
> >> > I can definitely confirm this, on a machine with 2 active VMs:
> > I'm assuming this is xl specific and that xend isn't behaving this way
> > too, is that correct?
> 
> That's a good question, I'm not really sure. I'm doing all the vm
> creation/migration/shutdown using libvirt 0.9.0 rc1 and just connecting
> with xen:/// to the local xend.  I thought xl was just a userspace
> replacement for xm and the backend functionality was the same?

xl is a complete standalone toolstack, including the backend
functionality.

> I don't have a spare server to test the patch with at the moment, but I
> can try this out later this week.

If you are running xm/xend rather than xl then it won't help.

But I'm not sure how one tells with libvrit which you are running, I'd
expect that if xend were running it would be used by default. Jim?

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-05  9:45       ` Ian Campbell
@ 2011-05-06 18:27         ` Jim Fehlig
  2011-05-06 23:25           ` Nathan March
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Fehlig @ 2011-05-06 18:27 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Nathan March, Andreas Kinzler, Daniel Stodden

Ian Campbell wrote:
> On Tue, 2011-05-03 at 18:18 +0100, Nathan March wrote:
>   
>> On 5/3/2011 8:56 AM, Ian Campbell wrote:
>>     
>>> On Fri, 2011-04-29 at 20:39 +0100, Nathan March wrote:
>>>       
>>>>>>> in Xen 4.1.0 using "tap2:aio:/dev/sdaX" Xen leaks tapdisk2 processes.
>>>>>>> After running some VMs and terminating them, only dom0 running, "ps
>>>>>>> aux" shows:
>>>>>>>               
>>>>> I can definitely confirm this, on a machine with 2 active VMs:
>>>>>           
>>> I'm assuming this is xl specific and that xend isn't behaving this way
>>> too, is that correct?
>>>       
>> That's a good question, I'm not really sure. I'm doing all the vm
>> creation/migration/shutdown using libvirt 0.9.0 rc1 and just connecting
>> with xen:/// to the local xend.  I thought xl was just a userspace
>> replacement for xm and the backend functionality was the same?
>>     

libvirt 0.9.0 was released.  In fact, 0.9.1 is now out.

>
> xl is a complete standalone toolstack, including the backend
> functionality.
>
>   
>> I don't have a spare server to test the patch with at the moment, but I
>> can try this out later this week.
>>     
>
> If you are running xm/xend rather than xl then it won't help.
>
> But I'm not sure how one tells with libvrit which you are running, I'd
> expect that if xend were running it would be used by default. Jim?
>   

If xend is running, libvirt will use it.  If not, it will attempt to use
libxenlight.  'virsh version' will tell which xen backend you are using.

E.g. if xend is running:
xen33: # virsh version
Compiled against library: libvir 0.9.0
Using library: libvir 0.9.0
Using API: Xen 3.0.1

If xend is not running:
xen33: # virsh version
Compiled against library: libvir 0.9.0
Using library: libvir 0.9.0
Using API: xenlight 0.9.0

Looks like I need to put libxenlight's version in there instead of
libvirt's version, but 'Xen' vs. 'xenlight' will tell which libvirt
backend is being used.

Regards,
Jim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-06 18:27         ` Jim Fehlig
@ 2011-05-06 23:25           ` Nathan March
  2011-05-09  8:23             ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan March @ 2011-05-06 23:25 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: Ian Campbell, xen-devel, Andreas Kinzler, Daniel Stodden



On 5/6/2011 11:27 AM, Jim Fehlig wrote:
>>
>>> I don't have a spare server to test the patch with at the moment, but I
>>> can try this out later this week.
>>>     
>> If you are running xm/xend rather than xl then it won't help.
>>
>> But I'm not sure how one tells with libvrit which you are running, I'd
>> expect that if xend were running it would be used by default. Jim?
>>   
> If xend is running, libvirt will use it.  If not, it will attempt to use
> libxenlight.  'virsh version' will tell which xen backend you are using.
>
> E.g. if xend is running:
> xen33: # virsh version
> Compiled against library: libvir 0.9.0
> Using library: libvir 0.9.0
> Using API: Xen 3.0.1
>
> If xend is not running:
> xen33: # virsh version
> Compiled against library: libvir 0.9.0
> Using library: libvir 0.9.0
> Using API: xenlight 0.9.0
>
> Looks like I need to put libxenlight's version in there instead of
> libvirt's version, but 'Xen' vs. 'xenlight' will tell which libvirt
> backend is being used.
>
In that case, I can confirm that I'm using xend:

xen1 ~ # virsh version
Compiled against library: libvir 0.9.0
Using library: libvir 0.9.0
Using API: Xen 3.0.1
Running hypervisor: Xen 4.1.0

- Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-06 23:25           ` Nathan March
@ 2011-05-09  8:23             ` Ian Campbell
  2011-05-10 19:17               ` Daniel Stodden
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2011-05-09  8:23 UTC (permalink / raw)
  To: Nathan March; +Cc: Jim Fehlig, xen-devel, Andreas Kinzler, Daniel Stodden

On Sat, 2011-05-07 at 00:25 +0100, Nathan March wrote:
> 
> On 5/6/2011 11:27 AM, Jim Fehlig wrote:
> >>
> >>> I don't have a spare server to test the patch with at the moment, but I
> >>> can try this out later this week.
> >>>     
> >> If you are running xm/xend rather than xl then it won't help.
> >>
> >> But I'm not sure how one tells with libvrit which you are running, I'd
> >> expect that if xend were running it would be used by default. Jim?
> >>   
> > If xend is running, libvirt will use it.  If not, it will attempt to use
> > libxenlight.  'virsh version' will tell which xen backend you are using.
> >
> > E.g. if xend is running:
> > xen33: # virsh version
> > Compiled against library: libvir 0.9.0
> > Using library: libvir 0.9.0
> > Using API: Xen 3.0.1
> >
> > If xend is not running:
> > xen33: # virsh version
> > Compiled against library: libvir 0.9.0
> > Using library: libvir 0.9.0
> > Using API: xenlight 0.9.0
> >
> > Looks like I need to put libxenlight's version in there instead of
> > libvirt's version, but 'Xen' vs. 'xenlight' will tell which libvirt
> > backend is being used.
> >
> In that case, I can confirm that I'm using xend:

Hrm, then my earlier patch is irrelevant and I've got no idea what is
supposed to cause the tapdisk process to exit in either the xend or xl
case but it seems like the issue is common to both -- Daniel, any
ideas?. 

> 
> xen1 ~ # virsh version
> Compiled against library: libvir 0.9.0
> Using library: libvir 0.9.0
> Using API: Xen 3.0.1
> Running hypervisor: Xen 4.1.0
> 
> - Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-09  8:23             ` Ian Campbell
@ 2011-05-10 19:17               ` Daniel Stodden
  2011-05-11  8:43                 ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Stodden @ 2011-05-10 19:17 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andreas, Jim Fehlig, xen-devel, Nathan March, Kinzler

On Mon, 2011-05-09 at 04:23 -0400, Ian Campbell wrote:
> On Sat, 2011-05-07 at 00:25 +0100, Nathan March wrote:
> > 
> > On 5/6/2011 11:27 AM, Jim Fehlig wrote:
> > >>
> > >>> I don't have a spare server to test the patch with at the moment, but I
> > >>> can try this out later this week.
> > >>>     
> > >> If you are running xm/xend rather than xl then it won't help.
> > >>
> > >> But I'm not sure how one tells with libvrit which you are running, I'd
> > >> expect that if xend were running it would be used by default. Jim?
> > >>   
> > > If xend is running, libvirt will use it.  If not, it will attempt to use
> > > libxenlight.  'virsh version' will tell which xen backend you are using.
> > >
> > > E.g. if xend is running:
> > > xen33: # virsh version
> > > Compiled against library: libvir 0.9.0
> > > Using library: libvir 0.9.0
> > > Using API: Xen 3.0.1
> > >
> > > If xend is not running:
> > > xen33: # virsh version
> > > Compiled against library: libvir 0.9.0
> > > Using library: libvir 0.9.0
> > > Using API: xenlight 0.9.0
> > >
> > > Looks like I need to put libxenlight's version in there instead of
> > > libvirt's version, but 'Xen' vs. 'xenlight' will tell which libvirt
> > > backend is being used.
> > >
> > In that case, I can confirm that I'm using xend:
> 
> Hrm, then my earlier patch is irrelevant and I've got no idea what is
> supposed to cause the tapdisk process to exit in either the xend or xl
> case but it seems like the issue is common to both -- Daniel, any
> ideas?. 

This stuff was originally written with toolstacks in mind which already
manage storage in more detail than just plug/unplug, so tap-ctl only
provides the minimum tool, not the framework. XCP will refcount the
node's usage, and shut down once dropping back to zero.

Does XL promote storage shared across VMs? Does XL have a big lock? If
no + yes then shutting down after the VBD should have worked.

Otherwise it gets more complicated. Try/error, i.e. calling destroy and
bailing out if the device node is found busy is not fully reliable, see
my other mail regarding bdev access noise. And plug/unplugs by
concurrent XLs interleaving will obviously race.

I can offer a patch which adds a timeout to destroy (possibly as 0), but
in theory the same issue obviously remains.

Usually it boils down to a refcount. Could go into xenstore, sth
like /local/domain/<me>/blktap/<minor>/refs, plus transactions. I think
that way even XCP might go use it at some point.

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in Xen 4.1.0: Xen leaks tapdisk2 processes
  2011-05-10 19:17               ` Daniel Stodden
@ 2011-05-11  8:43                 ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2011-05-11  8:43 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Jim Fehlig, xen-devel, Nathan March, Andreas Kinzler

On Tue, 2011-05-10 at 20:17 +0100, Daniel Stodden wrote:
> On Mon, 2011-05-09 at 04:23 -0400, Ian Campbell wrote:
> > On Sat, 2011-05-07 at 00:25 +0100, Nathan March wrote:
> > > 
> > > On 5/6/2011 11:27 AM, Jim Fehlig wrote:
> > > >>
> > > >>> I don't have a spare server to test the patch with at the moment, but I
> > > >>> can try this out later this week.
> > > >>>     
> > > >> If you are running xm/xend rather than xl then it won't help.
> > > >>
> > > >> But I'm not sure how one tells with libvrit which you are running, I'd
> > > >> expect that if xend were running it would be used by default. Jim?
> > > >>   
> > > > If xend is running, libvirt will use it.  If not, it will attempt to use
> > > > libxenlight.  'virsh version' will tell which xen backend you are using.
> > > >
> > > > E.g. if xend is running:
> > > > xen33: # virsh version
> > > > Compiled against library: libvir 0.9.0
> > > > Using library: libvir 0.9.0
> > > > Using API: Xen 3.0.1
> > > >
> > > > If xend is not running:
> > > > xen33: # virsh version
> > > > Compiled against library: libvir 0.9.0
> > > > Using library: libvir 0.9.0
> > > > Using API: xenlight 0.9.0
> > > >
> > > > Looks like I need to put libxenlight's version in there instead of
> > > > libvirt's version, but 'Xen' vs. 'xenlight' will tell which libvirt
> > > > backend is being used.
> > > >
> > > In that case, I can confirm that I'm using xend:
> > 
> > Hrm, then my earlier patch is irrelevant and I've got no idea what is
> > supposed to cause the tapdisk process to exit in either the xend or xl
> > case but it seems like the issue is common to both -- Daniel, any
> > ideas?. 
> 
> This stuff was originally written with toolstacks in mind which already
> manage storage in more detail than just plug/unplug, so tap-ctl only
> provides the minimum tool, not the framework. XCP will refcount the
> node's usage, and shut down once dropping back to zero.
> 
> Does XL promote storage shared across VMs? Does XL have a big lock? If
> no + yes then shutting down after the VBD should have worked.

I don't think XL can share tapdisks between VMs (neither does xend
AFAIK, how would that happen?).

I think we just want to attach the disk on VM create (or hotplug) and
detach it on VM destroy (or hotunplug). IOW the reference count can only
ever be 0 or 1.

> 
> Otherwise it gets more complicated. Try/error, i.e. calling destroy and
> bailing out if the device node is found busy is not fully reliable, see
> my other mail regarding bdev access noise. And plug/unplugs by
> concurrent XLs interleaving will obviously race.
> 
> I can offer a patch which adds a timeout to destroy (possibly as 0), but
> in theory the same issue obviously remains.
> 
> Usually it boils down to a refcount. Could go into xenstore, sth
> like /local/domain/<me>/blktap/<minor>/refs, plus transactions. I think
> that way even XCP might go use it at some point.
> 
> Daniel
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-11  8:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-30 10:28 Bug in Xen 4.1.0: Xen leaks tapdisk2 processes Andreas Kinzler
2011-04-29 19:39 ` Nathan March
2011-05-03 15:56   ` Ian Campbell
2011-05-03 17:18     ` Nathan March
2011-05-05  9:45       ` Ian Campbell
2011-05-06 18:27         ` Jim Fehlig
2011-05-06 23:25           ` Nathan March
2011-05-09  8:23             ` Ian Campbell
2011-05-10 19:17               ` Daniel Stodden
2011-05-11  8:43                 ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.