All of lore.kernel.org
 help / color / mirror / Atom feed
* Basic blktap2 functionality issues.
@ 2012-03-29 14:39 greg
  2012-03-29 17:35 ` Stefano Stabellini
  2012-03-30  8:17 ` Ian Campbell
  0 siblings, 2 replies; 10+ messages in thread
From: greg @ 2012-03-29 14:39 UTC (permalink / raw)
  To: xen-devel

Hi, hope the day is going well for everyone.

I had posted a note about this issue two weeks ago and didn't get any
response.  I don't know if that indicates that people are not using
blktap or if the question stumped everyone.

First of all there is a documentable issue with blktap in 4.1.2 and
xl.  Anyone trying to use it to export files to a guest will be
affected.  Using xl to do 'block-attach' and 'block-detach' in dom0
also doesn't work in stock 4.1.2 so I suspect there are generic issues
with blktap and xl.

In stock 4.1.2 using xl results in the tapdisk2 server process being
orphaned.  There is a patch floating around from Ian Campbell which
fixes this problem and either creates or uncovers what may be a more
fundamental problem.

With Ian's patch applied the tapdisk2 process terminates but the
tapdisk device is not released resulting in a steady accumulation of
orphan minor numbers.  The underlying cause of this appears to be a
resource deadlock between xl requesting a detach of the VBD and the
tapdisk2 process.

Ian actually acknowledges the phenomenon in one of his posts saying
there appeared to be a 'hang' when the VBD is released with his
patches.  The 'hang' is actually a livelock where the entire kernel
blocks until the select call from xl to the tapdisk2 process times out
and rescues things.  The error return from the select call is what
causes the xl code to not complete the release of the tapdisk minor.

I'm still hunting through the code maze trying to find the underlying
problem.  The tapdisk2 process is hanging on the munmap of the ring
buffers.  It 'feels' as if the problem may be secondary to xl still
holding a reference to some resource which blocks the unmap of the
ring buffers.

I will continue to hunt but if anyone has any pointers send them my
way.  It takes a bit of time to come up to speed on all the code
paths.

In a broader context I think the XEN community needs to take a
reasoned look at how to handle the file issue.  It appears as if Dan
Stodden has disappeared so I get the sense the entire blktap2
architecture is a bit rudderless (no judgement just observation).  I'm
trying to put my money where my mouth is and actually hunt for the
problems which are there.

I've seen rumbles on LKML about discussions with regards to
modifications to loop in order to deal with the page cache issues
which hinder the reliability of delivering virtual block devices over
loop.  I don't know how far out that is or even if it will eventuate.
I've also heard rumbles about a mythical 'blktap3' which runs
completely in userspace.  If that is the direction things are going I
would certainly be willing to hammer on that rather then put more time
into blktap2 if there is 'blktap3' code someplace.

I will look forward to any comments or suggestions people may have.

Have a good weekend.

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"My thoughts on trusting Open-Source?  A quote I once saw said it
 best: 'Remember, Amateurs built the ark.  Professionals built the
 Titanic.'  Perhaps most significantly the ark was one guy, there were
 no doubt committees involved with the Titanic project."
                                -- Dr. G.W. Wettstein
                                   Resurrection

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-03-29 14:39 Basic blktap2 functionality issues greg
@ 2012-03-29 17:35 ` Stefano Stabellini
  2012-03-30  8:02   ` Ian Campbell
  2012-03-30  8:17 ` Ian Campbell
  1 sibling, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2012-03-29 17:35 UTC (permalink / raw)
  To: greg; +Cc: xen-devel

On Thu, 29 Mar 2012, greg@enjellic.com wrote:
> Hi, hope the day is going well for everyone.
> 
> I had posted a note about this issue two weeks ago and didn't get any
> response.  I don't know if that indicates that people are not using
> blktap or if the question stumped everyone.
> 
> First of all there is a documentable issue with blktap in 4.1.2 and
> xl.  Anyone trying to use it to export files to a guest will be
> affected.  Using xl to do 'block-attach' and 'block-detach' in dom0
> also doesn't work in stock 4.1.2 so I suspect there are generic issues
> with blktap and xl.
> 
> In stock 4.1.2 using xl results in the tapdisk2 server process being
> orphaned.  There is a patch floating around from Ian Campbell which
> fixes this problem and either creates or uncovers what may be a more
> fundamental problem.
> 
> With Ian's patch applied the tapdisk2 process terminates but the
> tapdisk device is not released resulting in a steady accumulation of
> orphan minor numbers.  The underlying cause of this appears to be a
> resource deadlock between xl requesting a detach of the VBD and the
> tapdisk2 process.
> 
> Ian actually acknowledges the phenomenon in one of his posts saying
> there appeared to be a 'hang' when the VBD is released with his
> patches.  The 'hang' is actually a livelock where the entire kernel
> blocks until the select call from xl to the tapdisk2 process times out
> and rescues things.  The error return from the select call is what
> causes the xl code to not complete the release of the tapdisk minor.
> 
> I'm still hunting through the code maze trying to find the underlying
> problem.  The tapdisk2 process is hanging on the munmap of the ring
> buffers.  It 'feels' as if the problem may be secondary to xl still
> holding a reference to some resource which blocks the unmap of the
> ring buffers.
> 
> I will continue to hunt but if anyone has any pointers send them my
> way.  It takes a bit of time to come up to speed on all the code
> paths.
> 
> In a broader context I think the XEN community needs to take a
> reasoned look at how to handle the file issue.  It appears as if Dan
> Stodden has disappeared so I get the sense the entire blktap2
> architecture is a bit rudderless (no judgement just observation).  I'm
> trying to put my money where my mouth is and actually hunt for the
> problems which are there.
> 
> I've seen rumbles on LKML about discussions with regards to
> modifications to loop in order to deal with the page cache issues
> which hinder the reliability of delivering virtual block devices over
> loop.  I don't know how far out that is or even if it will eventuate.
> I've also heard rumbles about a mythical 'blktap3' which runs
> completely in userspace.  If that is the direction things are going I
> would certainly be willing to hammer on that rather then put more time
> into blktap2 if there is 'blktap3' code someplace.
> 
> I will look forward to any comments or suggestions people may have.

A completely userspace blktap version is indeed in the work, but it is
not clear yet when it is going to be ready.

Alternatively if you don't need VHD format support, you can use QCOW or
QCOW2 with upstream QEMU in xen-unstable, that leads to very good
performances and still provides copy on write support.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-03-29 17:35 ` Stefano Stabellini
@ 2012-03-30  8:02   ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2012-03-30  8:02 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: greg, xen-devel

On Thu, 2012-03-29 at 18:35 +0100, Stefano Stabellini wrote:
> On Thu, 29 Mar 2012, greg@enjellic.com wrote:
> > Hi, hope the day is going well for everyone.
> > 
> > I had posted a note about this issue two weeks ago and didn't get any
> > response.  I don't know if that indicates that people are not using
> > blktap or if the question stumped everyone.
> > 
> > First of all there is a documentable issue with blktap in 4.1.2 and
> > xl.  Anyone trying to use it to export files to a guest will be
> > affected.  Using xl to do 'block-attach' and 'block-detach' in dom0
> > also doesn't work in stock 4.1.2 so I suspect there are generic issues
> > with blktap and xl.
> > 
> > In stock 4.1.2 using xl results in the tapdisk2 server process being
> > orphaned.  There is a patch floating around from Ian Campbell which
> > fixes this problem and either creates or uncovers what may be a more
> > fundamental problem.
> > 
> > With Ian's patch applied the tapdisk2 process terminates but the
> > tapdisk device is not released resulting in a steady accumulation of
> > orphan minor numbers.  The underlying cause of this appears to be a
> > resource deadlock between xl requesting a detach of the VBD and the
> > tapdisk2 process.
> > 
> > Ian actually acknowledges the phenomenon in one of his posts saying
> > there appeared to be a 'hang' when the VBD is released with his
> > patches.  The 'hang' is actually a livelock where the entire kernel
> > blocks until the select call from xl to the tapdisk2 process times out
> > and rescues things.  The error return from the select call is what
> > causes the xl code to not complete the release of the tapdisk minor.
> > 
> > I'm still hunting through the code maze trying to find the underlying
> > problem.  The tapdisk2 process is hanging on the munmap of the ring
> > buffers.  It 'feels' as if the problem may be secondary to xl still
> > holding a reference to some resource which blocks the unmap of the
> > ring buffers.
> > 
> > I will continue to hunt but if anyone has any pointers send them my
> > way.  It takes a bit of time to come up to speed on all the code
> > paths.
> > 
> > In a broader context I think the XEN community needs to take a
> > reasoned look at how to handle the file issue.  It appears as if Dan
> > Stodden has disappeared so I get the sense the entire blktap2
> > architecture is a bit rudderless (no judgement just observation).  I'm
> > trying to put my money where my mouth is and actually hunt for the
> > problems which are there.
> > 
> > I've seen rumbles on LKML about discussions with regards to
> > modifications to loop in order to deal with the page cache issues
> > which hinder the reliability of delivering virtual block devices over
> > loop.  I don't know how far out that is or even if it will eventuate.
> > I've also heard rumbles about a mythical 'blktap3' which runs
> > completely in userspace.  If that is the direction things are going I
> > would certainly be willing to hammer on that rather then put more time
> > into blktap2 if there is 'blktap3' code someplace.
> > 
> > I will look forward to any comments or suggestions people may have.
> 
> A completely userspace blktap version is indeed in the work, but it is
> not clear yet when it is going to be ready.
> 
> Alternatively if you don't need VHD format support, you can use QCOW or
> QCOW2 with upstream QEMU in xen-unstable, that leads to very good
> performances and still provides copy on write support.
> 

We should still support blktap2 in the 4.2 for people who have a kernel
with the necessary driver included therefore I still think the bug which
Greg reports needs investigating and hopefully fixing.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-03-29 14:39 Basic blktap2 functionality issues greg
  2012-03-29 17:35 ` Stefano Stabellini
@ 2012-03-30  8:17 ` Ian Campbell
  2012-03-30 11:23   ` Ian Campbell
  1 sibling, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-03-30  8:17 UTC (permalink / raw)
  To: greg; +Cc: xen-devel

On Thu, 2012-03-29 at 15:39 +0100, greg@enjellic.com wrote:
> Hi, hope the day is going well for everyone.
> 
> I had posted a note about this issue two weeks ago and didn't get any
> response.  I don't know if that indicates that people are not using
> blktap or if the question stumped everyone.
> 
> First of all there is a documentable issue with blktap in 4.1.2 and
> xl.  Anyone trying to use it to export files to a guest will be
> affected.

The issue you refer to here is the deadlock describe below right? The
block device is actually functional while the guest is up and running?

>   Using xl to do 'block-attach' and 'block-detach' in dom0
> also doesn't work in stock 4.1.2 so I suspect there are generic issues
> with blktap and xl.

Stefano was looking at block-attach to dom0 more generally in
xen-unstable too.

> In stock 4.1.2 using xl results in the tapdisk2 server process being
> orphaned.  There is a patch floating around from Ian Campbell which
> fixes this problem and either creates or uncovers what may be a more
> fundamental problem.

That patch has been applied to xen-unstable -- does this hang occur for
you there? (I'm in the process of installing a suitable system to try
and reproduce).

> With Ian's patch applied the tapdisk2 process terminates but the
> tapdisk device is not released resulting in a steady accumulation of
> orphan minor numbers.  The underlying cause of this appears to be a
> resource deadlock between xl requesting a detach of the VBD and the
> tapdisk2 process.

When you say "the tapdisk2 process terminates" I guess you mean "tries
to terminate but gets blocked" rather than actually quits, since
otherwise I don't think there would be a deadlock?

I think an approach worth trying would be to have
tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
doing the actual detach. i.e. it would respond with "Yes, I will do
that" rather than "Yes, I have done that". My speculation is that this
will allow libxl to continue and hopefully avoid the deadlock.

A similar alternative would be to have tap_ctl_detach not wait for a
response at all.

In both cases I think it could be argued that if the
TAPDISK_MESSAGE_DETACH message does not result in tapdisk2 shutting down
there isn't much that libxl could do about it anyway so it might as well
get on with its life.

The other approach would be to figure out what libxl is doing after its
call (or perhaps just not doing) to tap_ctl_detach which actually has to
be done first e..g perhaps destroying the xenstore backend directory?

> I've also heard rumbles about a mythical 'blktap3' which runs
> completely in userspace.  If that is the direction things are going I
> would certainly be willing to hammer on that rather then put more time
> into blktap2 if there is 'blktap3' code someplace.

I expect that a lot of the control-plane stuff will be the same between
blktap2 and 3 so any issues fixed in 2 will likely help 3 too.

It is also worth fixing issues in blktap2 simply for the sake of fixing
them since there will still be people who are using it in 4.2.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-03-30  8:17 ` Ian Campbell
@ 2012-03-30 11:23   ` Ian Campbell
  2012-04-02 15:51     ` Ian Jackson
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-03-30 11:23 UTC (permalink / raw)
  To: greg; +Cc: Ian Jackson, xen-devel

On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> I think an approach worth trying would be to have
> tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> doing the actual detach. i.e. it would respond with "Yes, I will do
> that" rather than "Yes, I have done that". My speculation is that this
> will allow libxl to continue and hopefully avoid the deadlock.

This seems to be the case as the following fixes things for me. Thanks
very much for your analysis which lead me to this solution...

Ian.

8<--------------------------------

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1333106445 -3600
# Node ID 79e3dbe4b659e78408a9eea76c51a601bd4a383a
# Parent  947e095a6d29b5e90a01b7c06b298d4c91dcf3ac
tapdisk: respond to destroy request before tearing down the commuication channel

This avoids a deadlock while sending the response which leads to a 5s timeout
and leaked tapdisk processes. Instead send the response after getting the VBD
which appears to be the only step wich can fail (or at least the only one which
returns an error code).

Also call libxl__device_destroy_tapdisk before removing the backend directory
in xenstore since we need the tapdisk-params value.

Reported-by: Greg Wettstein <greg@enjellic.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 947e095a6d29 -r 79e3dbe4b659 tools/blktap2/drivers/tapdisk-control.c
--- a/tools/blktap2/drivers/tapdisk-control.c	Tue Mar 27 14:09:11 2012 +0100
+++ b/tools/blktap2/drivers/tapdisk-control.c	Fri Mar 30 12:20:45 2012 +0100
@@ -379,10 +379,18 @@ tapdisk_control_detach_vbd(struct tapdis
 	int err;
 
 	vbd = tapdisk_server_get_vbd(request->cookie);
-	if (!vbd) {
-		err = -EINVAL;
-		goto out;
-	}
+	err = !vbd ? -EINVAL : 0;
+
+	memset(&response, 0, sizeof(response));
+	response.type = TAPDISK_MESSAGE_DETACH_RSP;
+	response.cookie = request->cookie;
+	response.u.response.error = -err;
+
+	tapdisk_control_write_message(connection->socket, &response, 2);
+	tapdisk_control_close_connection(connection);
+
+	if (err)
+		return;
 
 	tapdisk_vbd_detach(vbd);
 
@@ -390,16 +398,6 @@ tapdisk_control_detach_vbd(struct tapdis
 		tapdisk_server_remove_vbd(vbd);
 		free(vbd);
 	}
-
-	err = 0;
-out:
-	memset(&response, 0, sizeof(response));
-	response.type = TAPDISK_MESSAGE_DETACH_RSP;
-	response.cookie = request->cookie;
-	response.u.response.error = -err;
-
-	tapdisk_control_write_message(connection->socket, &response, 2);
-	tapdisk_control_close_connection(connection);
 }
 
 static void
diff -r 947e095a6d29 -r 79e3dbe4b659 tools/libxl/libxl_device.c
--- a/tools/libxl/libxl_device.c	Tue Mar 27 14:09:11 2012 +0100
+++ b/tools/libxl/libxl_device.c	Fri Mar 30 12:20:45 2012 +0100
@@ -432,11 +432,11 @@ int libxl__device_destroy(libxl__gc *gc,
     char *be_path = libxl__device_backend_path(gc, dev);
     char *fe_path = libxl__device_frontend_path(gc, dev);
 
+    libxl__device_destroy_tapdisk(gc, be_path);
+
     xs_rm(ctx->xsh, XBT_NULL, be_path);
     xs_rm(ctx->xsh, XBT_NULL, fe_path);
 
-    libxl__device_destroy_tapdisk(gc, be_path);
-
     return 0;
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-03-30 11:23   ` Ian Campbell
@ 2012-04-02 15:51     ` Ian Jackson
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2012-04-02 15:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: greg, xen-devel

Ian Campbell writes ("Re: [Xen-devel] Basic blktap2 functionality issues."):
> On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > I think an approach worth trying would be to have
> > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > doing the actual detach. i.e. it would respond with "Yes, I will do
> > that" rather than "Yes, I have done that". My speculation is that this
> > will allow libxl to continue and hopefully avoid the deadlock.
> 
> This seems to be the case as the following fixes things for me. Thanks
> very much for your analysis which lead me to this solution...

Greg, can you confirm whether this works for you ?

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
@ 2012-04-17 15:55 Dr. Greg Wettstein
  0 siblings, 0 replies; 10+ messages in thread
From: Dr. Greg Wettstein @ 2012-04-17 15:55 UTC (permalink / raw)
  To: Ian Jackson, Ian Campbell; +Cc: greg, xen-devel

On Apr 2,  4:51pm, Ian Jackson wrote:
} Subject: Re: [Xen-devel] Basic blktap2 functionality issues.

Hi Ian, et. al, hope this note finds your day going well.

> Ian Campbell writes ("Re: [Xen-devel] Basic blktap2 functionality issues."):
> > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > > I think an approach worth trying would be to have
> > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > > doing the actual detach. i.e. it would respond with "Yes, I will do
> > > that" rather than "Yes, I have done that". My speculation is that this
> > > will allow libxl to continue and hopefully avoid the deadlock.
> > 
> > This seems to be the case as the following fixes things for me. Thanks
> > very much for your analysis which lead me to this solution...

> Greg, can you confirm whether this works for you ?

I've been grinding through the blktap2 issues and came up with some
additional findings I wanted to bounce off everyone.

The ability to reproduce this problem with the tap-ctl utility helped
confirm what the problem 'smelled like' from the beginning, ie. a hung
reference to the tapdev device instance which is ultimately preventing
the .release method from completing on the tapdisk2 side of things.

I started snooping around in xenstore and I believe I am seeing an
issue where xl does not release the the vbd instance in dom0.  This
causes a steady and persistent increase in the number of 'stale' vbd
instances held in dom0.  At a minimum this would suggest that xl is
not properly releasing resources properly which may help track the
whole issue down.

I replaced the tap: directive with a phy: device instance and saw the
orphan vbd instance as well so the problem doesn't seem related to a
blktap driver instance.

This is with the original patch applied which 'fixed' the problem with
the tapdisk2 process being left alive.  I'm going to back out that
patch to see if it reproduces with stock 4.1.2.

In the meantime let me know if anyone else is seeing this or if anyone
remembers a generic problem like this being fixed in xl in unstable.

> Ian.

Have a good afternoon.

}-- End of excerpt from Ian Jackson

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"We have more to fear from the bungling of the incompetent than from
 the machinations of the wicked."
                                -- Slashdot

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
@ 2012-04-11 15:13 Dr. Greg Wettstein
  0 siblings, 0 replies; 10+ messages in thread
From: Dr. Greg Wettstein @ 2012-04-11 15:13 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

On Apr 10, 10:04am, Ian Campbell wrote:
} Subject: Re: [Xen-devel] Basic blktap2 functionality issues.

> On Sat, 2012-04-07 at 16:59 +0100, Dr. Greg Wettstein wrote:
> > On Mar 30, 12:23pm, Ian Campbell wrote:
> > } Subject: Re: [Xen-devel] Basic blktap2 functionality issues.

> > > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > > > I think an approach worth trying would be to have
> > > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > > > doing the actual detach. i.e. it would respond with "Yes, I will do
> > > > that" rather than "Yes, I have done that". My speculation is that this
> > > > will allow libxl to continue and hopefully avoid the deadlock.
> > 
> > > This seems to be the case as the following fixes things for
> > > me. Thanks very much for your analysis which lead me to this
> > > solution...
> > 
> > I ported your fix into 4.1.2 but I think we still have a problem, at
> > least in this codebase.
> > 
> > I no longer see the select timeout delay when xl shuts down but upon
> > shutdown the minor number is not freed.  A 'tap-ctl list' shows a
> > steadily increasing set of orphaned minor numbers as VM's are started
> > up and shutdown.
> > 
> > Are you seeing this in your development codebase?

> It turns out that I am, yes. e.g. after starting/stopping a guest 3
> times:
> # tap-ctl list
>        -  0    -          - -
>        -  1    -          - -
>        -  2    -          - -

Yes, that is the same thing I'm seeing and would be expected.

> > The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in
> > tap_ctl_free().  The underlying reason for the ioctl failure is the
> > check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()

> drivers/*xen*/... right? Or do you have a different blktap to me?

I believe the 'drivers/*xen*' was an old location.

You may have seen the note I sent to Tim Wood yesterday.  We isolated
the blktap2 code from the last GIT tree which Dan Stodden had
available and are carrying it as a standalone patch.  The version we
are testing against is available from the following location:

ftp://ftp.enjellic.com/pub/xen/blktap2-3.2.patch

Dan had the code in the following location:

	drivers/block/blktap

> > for whether or not the task_struct pointer in the blktap_ring
> > structure has been NULLed.
> > 
> > Which certainly makes sense since there is a race between xl's call to
> > tap_ctl_free() and tapdisk2 getting to the point where it can close
> > its descriptor to the blktap instance and thus invoke the .release
> > method which translates into a call to blktap_ring_release() which
> > NULL's the task_struct pointer.

> Sounds right, but then you would expect both the ioctl and release
> path to cleanup, depending on who loses the race?

The problem is the ioctl path can't clean up properly since it is
blocked from doing so by the check for a valid task_struct pointer.
The tapdisk2 side cleans up properly but with the 'acknowledge the
detach and then detach patch' the xl side is always guaranteed to win
the race.

So the patch moved the failure one ladder step downwards in:

[xensrc]/tools/blktap2/control/tap-ctl-destroy.c:tap_control_destroy()

But the effective result is the same, modulo the elimination of the
'hang' while the select call times out.

> There also seems to be a BLKTAP_SHUTDOWN_REQUESTED bit which looks like
> it is involved somehow...
>
> We've gotten way beyond my understanding of blktap internals though.

That bit is set by the kernel kobject infrastructure as part of the
teardown of the block device and I believe is implicated to the extent
that the xl<->tapdisk2 deadlock is caused by a reference still being
held to the block device when the the detach message is sent to
tapdisk2.

I can now reproduce the deadlock with a tap-ctl recipe which should
help in chasing things down.  CAUTION: *** the following will livelock
your kernel, you will need to have a shell available to force an
unmount of the tapdev device. ***

	1.) Create aio based device with tap-ctl.

	2.) Mount tapdev device.

	3.) Close device with tap-ctl.

	4.) Send detach message with tap-ctl.

	* Livelock *

	5.) Kill tapdisk2 process.

	6.) Unmount tapdev - kernel issues WARNING from fs/buffer.c.

Having this recipe should at least help run down which lock we are
wedging on.

> > Let me know if you are seeing the issues I'm seeing, in the meantime I
> > will keep hunting to see if I can rundown the ultimate cause of the
> > deadlock.  Given the above trace it has to be an issue with xl
> > orchestrating the release of resources which reference the tapdev
> > block device.

> I'm not convinced that the shutdown stuff on the kernel side isn't
> either horribly broken or at best fragile (perhaps xl just tickles
> it differently to xend).

I don't believe its fragile as much as the classic problem of having
the kernel dependent on userspace.

I think the underlying problem is the device release order between
xend and xl.  I believe xend is releasing all references to the tapdev
device before invoking userspace cleanup.

> Please do keep hunting and let us know what you find...

I will wade a bit deeper into xl to see if I can isolate the ordering
problem.

> Thanks,
> Ian

Thanks for the verification from your end on the problem.

Have a good day.

Greg

}-- End of excerpt from Ian Campbell

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Inspite of all evidence to the contrary, the entire universe is
 composed of only two basic substances: magic and bullshit."
                                -- Ian Macdonald

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
  2012-04-07 15:59 Dr. Greg Wettstein
@ 2012-04-10 10:04 ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2012-04-10 10:04 UTC (permalink / raw)
  To: greg; +Cc: xen-devel

On Sat, 2012-04-07 at 16:59 +0100, Dr. Greg Wettstein wrote:
> On Mar 30, 12:23pm, Ian Campbell wrote:
> } Subject: Re: [Xen-devel] Basic blktap2 functionality issues.
> 
> Hi, hope the weekend is going well for everyone.
> 
> Sorry for the delay in getting back to everyone on this, had a
> deadline on another project.
> 
> > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > > I think an approach worth trying would be to have
> > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > > doing the actual detach. i.e. it would respond with "Yes, I will do
> > > that" rather than "Yes, I have done that". My speculation is that this
> > > will allow libxl to continue and hopefully avoid the deadlock.
> 
> > This seems to be the case as the following fixes things for
> > me. Thanks very much for your analysis which lead me to this
> > solution...
> 
> I ported your fix into 4.1.2 but I think we still have a problem, at
> least in this codebase.
> 
> I no longer see the select timeout delay when xl shuts down but upon
> shutdown the minor number is not freed.  A 'tap-ctl list' shows a
> steadily increasing set of orphaned minor numbers as VM's are started
> up and shutdown.
> 
> Are you seeing this in your development codebase?

It turns out that I am, yes. e.g. after starting/stopping a guest 3
times:
# tap-ctl list
       -  0    -          - -
       -  1    -          - -
       -  2    -          - -

> The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in
> tap_ctl_free().  The underlying reason for the ioctl failure is the
> check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()

drivers/*xen*/... right? Or do you have a different blktap to me?

> for whether or not the task_struct pointer in the blktap_ring
> structure has been NULLed.
> 
> Which certainly makes sense since there is a race between xl's call to
> tap_ctl_free() and tapdisk2 getting to the point where it can close
> its descriptor to the blktap instance and thus invoke the .release
> method which translates into a call to blktap_ring_release() which
> NULL's the task_struct pointer.

Sounds right, but then you would expect both the ioctl and release path
to cleanup, depending on who loses the race? 

There also seems to be a BLKTAP_SHUTDOWN_REQUESTED bit which looks like
it is involved somehow...

We've gotten way beyond my understanding of blktap internals though.

> If you are not seeing the orphan minor numbers there must be ordering
> changes in the unstable version of xl which eliminate or alters the
> race timing.
> 
> For the sake of completeness of information for this thread I captured
> the following stack trace of a tapdisk2 when it is deadlocked against
> xl:
[... thanks ...]
> Let me know if you are seeing the issues I'm seeing, in the meantime I
> will keep hunting to see if I can rundown the ultimate cause of the
> deadlock.  Given the above trace it has to be an issue with xl
> orchestrating the release of resources which reference the tapdev
> block device.

I'm not convinced that the shutdown stuff on the kernel side isn't
either horribly broken or at best fragile (perhaps xl just tickles it
differently to xend).

Please do keep hunting and let us know what you find...

Thanks,
Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Basic blktap2 functionality issues.
@ 2012-04-07 15:59 Dr. Greg Wettstein
  2012-04-10 10:04 ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Dr. Greg Wettstein @ 2012-04-07 15:59 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

On Mar 30, 12:23pm, Ian Campbell wrote:
} Subject: Re: [Xen-devel] Basic blktap2 functionality issues.

Hi, hope the weekend is going well for everyone.

Sorry for the delay in getting back to everyone on this, had a
deadline on another project.

> On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > I think an approach worth trying would be to have
> > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > doing the actual detach. i.e. it would respond with "Yes, I will do
> > that" rather than "Yes, I have done that". My speculation is that this
> > will allow libxl to continue and hopefully avoid the deadlock.

> This seems to be the case as the following fixes things for
> me. Thanks very much for your analysis which lead me to this
> solution...

I ported your fix into 4.1.2 but I think we still have a problem, at
least in this codebase.

I no longer see the select timeout delay when xl shuts down but upon
shutdown the minor number is not freed.  A 'tap-ctl list' shows a
steadily increasing set of orphaned minor numbers as VM's are started
up and shutdown.

Are you seeing this in your development codebase?

The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in
tap_ctl_free().  The underlying reason for the ioctl failure is the
check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()
for whether or not the task_struct pointer in the blktap_ring
structure has been NULLed.

Which certainly makes sense since there is a race between xl's call to
tap_ctl_free() and tapdisk2 getting to the point where it can close
its descriptor to the blktap instance and thus invoke the .release
method which translates into a call to blktap_ring_release() which
NULL's the task_struct pointer.

If you are not seeing the orphan minor numbers there must be ordering
changes in the unstable version of xl which eliminate or alters the
race timing.

For the sake of completeness of information for this thread I captured
the following stack trace of a tapdisk2 when it is deadlocked against
xl:

---------------------------------------------------------------------------
Apr  1 07:03:45 hooter kernel: Call Trace:
Apr  1 07:03:45 hooter kernel:  [<c10aa791>] ? blkdev_get_blocks+0xb4/0xb4
Apr  1 07:03:45 hooter kernel:  [<c109ab11>] ? iput+0x28/0x143
Apr  1 07:03:45 hooter kernel:  [<c10e9fec>] ? blk_peek_request+0x155/0x165
Apr  1 07:03:45 hooter kernel:  [<c128279c>] schedule+0x4d/0x4f
Apr  1 07:03:45 hooter kernel:  [<f7a9053b>] blktap_device_destroy_sync+0x63/0x76 [blktap]
Apr  1 07:03:45 hooter kernel:  [<c104204a>] ? wake_up_bit+0x61/0x61
Apr  1 07:03:45 hooter kernel:  [<f7a8f57a>] blktap_ring_release+0xe/0x29 [blktap]
Apr  1 07:03:45 hooter kernel:  [<c108aba7>] fput+0xce/0x167
Apr  1 07:03:45 hooter kernel:  [<c107944a>] remove_vma+0x28/0x47
Apr  1 07:03:45 hooter kernel:  [<c107a19a>] do_munmap+0x1e8/0x204
Apr  1 07:03:45 hooter kernel:  [<c107a1de>] sys_munmap+0x28/0x37
Apr  1 07:03:45 hooter kernel:  [<c1284adc>] sysenter_do_call+0x12/0x2c
Apr  1 07:03:45 hooter kernel:  [<c1280000>] ? migration_call+0x1d9/0x1f2
Apr  1 07:03:45 hooter kernel:  c686dad0 00000286 c1045bcd e86581c0 e84f6380 c13d1c80 c13d1c80 d753476c
Apr  1 07:03:45 hooter kernel:  e9185b00 e9185c78 00000000 d75346cb 00002ad2 d75346cb e8515ae4 e8515ae4
Apr  1 07:03:45 hooter kernel:  c1005597 00000000 ed7d80c8 c686dae4 c686da98 c1005d10 ed7d02c0 00000000
---------------------------------------------------------------------------

Let me know if you are seeing the issues I'm seeing, in the meantime I
will keep hunting to see if I can rundown the ultimate cause of the
deadlock.  Given the above trace it has to be an issue with xl
orchestrating the release of resources which reference the tapdev
block device.

> Ian.

Will look forward to your thoughts.

Have a good weekend.

}-- End of excerpt from Ian Campbell

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"According to the philosopher Ly Tin Wheedle, chaos is found in greatest
 abundance wherever order is being sought.  It always defeats order,
 because it is better organized."
                                -- Terry Pratchett
                                   Interesting Times

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-17 15:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29 14:39 Basic blktap2 functionality issues greg
2012-03-29 17:35 ` Stefano Stabellini
2012-03-30  8:02   ` Ian Campbell
2012-03-30  8:17 ` Ian Campbell
2012-03-30 11:23   ` Ian Campbell
2012-04-02 15:51     ` Ian Jackson
2012-04-07 15:59 Dr. Greg Wettstein
2012-04-10 10:04 ` Ian Campbell
2012-04-11 15:13 Dr. Greg Wettstein
2012-04-17 15:55 Dr. Greg Wettstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.