All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: driver domain crash and reconnect handling
       [not found] <81A73678E76EA642801C8F2E4823AD21014183065D27@LONPMAILBOX01.citrite.net>
@ 2013-01-21 12:20 ` Ian Campbell
       [not found] ` <1358770844.3279.194.camel@zakaz.uk.xensource.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Ian Campbell @ 2013-01-21 12:20 UTC (permalink / raw)
  To: Dave Scott; +Cc: xen-api, Zoltan Kiss, 'xen-devel@lists.xen.org'

On Mon, 2013-01-21 at 11:31 +0000, Dave Scott wrote:
> Hi,
> 
> [ my apologies if this has been discussed before but I couldn't
>   find a relevant thread ]

I don't think it has.

> In XCP we're hoping to make serious use of driver domains soon.
> We'd like to tell people that their xen-based cloud is even
> more robust than before, because even if a host driver crashes,
> there is only a slight interruption to guest I/O. For this to
> work smoothly, we need to figure out how to re-establish disk
> and network I/O after the driver restart -- this is where I'd
> appreciate some advice!
> 
> Is the current xenstore protocol considered sufficient to
> support reconnecting a frontend to a new backend? I did a few
> simple experiments with an XCP driver domain prototype a while
> back and I failed to make the frontend happy -- usually it would
> become confused about the backend and become stuck. This might
> just be because I didn't know what I was doing :-)

I think the protocol is probably sufficient but the implementations of
that protocol are not...

> Zoltan (cc:d) also did a few simple experiments to see whether
> we could re-use the existing suspend/resume infrastructure,
> similar to the 'fast' resume we already use for live checkpoint.
> As an experiment he modified libxc's xc_resume.c to allow the
> guest's HYPERVISOR_suspend hypercall invocation to return with
> '0' (success) rather than '1' (cancelled). The effect of this
> was to leave the domain running, but since it thinks it has just
> resumed in another domain, it explicitly reconnects its frontends.
> With this change and one or two others (like fixing the
> start_info->{store_,console.domU}.mfns) he made it work for a
> number of oldish guests. I'm sure he can describe the changes
> needed more accurately than I can!

Would be interesting to know, especially if everything was achieved with
toolstack side changes only!

> What do you think of this approach? Since it's based on the
> existing suspend/resume code it should hopefully work with all
> guest types without having to update the frontends or hopefully even
> fix bugs in them (because it looks just like a regular resume which
> is pretty well tested everywhere). This is particularly important in
> "cloud" scenarios because the people running clouds have usually
> little or no control over the software their customers are running.
> Unfortunately if we have to wait for a PV frontend change to trickle
> into all the common distros it will be a while before we can fully
> benefit from driver domain restart. If there is a better way
> of doing this in the long term involving a frontend change, what
> do you think about this as a stopgap until the frontends are updated?

I think it could undoubtedly serve well as a stop gap.

Longer term I guess it depends on the shortcomings of this approach
whether we also want to do something more advanced in the PV drivers
upstream and have them trickle through. The main downsides I suppose is
the brief outage due to the proto-suspend plus the requirement to
reconnect all devices and not just the failed one?

I expect the outage due to the proto-suspend is dwarfed by the outage
caused by a backend going away for however long it takes to notice,
rebuild, reset the hardware, etc etc.

The "it's just a normal-ish suspend" argument is pretty compelling since
you are correct that it is likely to be better tested than a crashing
driver domain.

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found] ` <1358770844.3279.194.camel@zakaz.uk.xensource.com>
@ 2013-01-23 21:58   ` Zoltan Kiss
  2013-01-24  9:59     ` Ian Campbell
                       ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-23 21:58 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-api, Dave Scott, 'xen-devel@lists.xen.org'

Hi,

On 21/01/13 12:20, Ian Campbell wrote:
> On Mon, 2013-01-21 at 11:31 +0000, Dave Scott wrote:
>> Is the current xenstore protocol considered sufficient to
>> support reconnecting a frontend to a new backend? I did a few
>> simple experiments with an XCP driver domain prototype a while
>> back and I failed to make the frontend happy -- usually it would
>> become confused about the backend and become stuck. This might
>> just be because I didn't know what I was doing :-)
>
> I think the protocol is probably sufficient but the implementations of
> that protocol are not...
What kind of problems do you think about?

>> Zoltan (cc:d) also did a few simple experiments to see whether
>> we could re-use the existing suspend/resume infrastructure,
>> similar to the 'fast' resume we already use for live checkpoint.
>> As an experiment he modified libxc's xc_resume.c to allow the
>> guest's HYPERVISOR_suspend hypercall invocation to return with
>> '0' (success) rather than '1' (cancelled). The effect of this
>> was to leave the domain running, but since it thinks it has just
>> resumed in another domain, it explicitly reconnects its frontends.
>> With this change and one or two others (like fixing the
>> start_info->{store_,console.domU}.mfns) he made it work for a
>> number of oldish guests. I'm sure he can describe the changes
>> needed more accurately than I can!
>
> Would be interesting to know, especially if everything was achieved with
> toolstack side changes only!
Actually I've used the xc_domain_resume_any() function from libxc to 
resume the guests. It worked with PV guests, however with some hacks in 
the hypervisor to silently discarding the error condicions, and not 
returning from the hypercall with an error. The two guests I've used, 
and their problems with the hypercall return values:

- SLES 11 SP1 (2.6.32.12) crashes because VCPUOP_register_vcpu_info 
hypercall returns EINVAL, as ( v->arch.vcpu_info_mfn != INVALID_MFN )
- Debian Squeeze 6.0 (2.6.32-5) crashes because EVTCHNOP_bind_virq 
returns EEXISTS, as ( v->virq_to_evtchnvirq != 0 )
- (these hypercalls were made right after guest comes back from the 
suspend hypercall)

I suppose there will be similar problems with other PV guests, I intend 
to test other ones as well. My current problem is to architect a proper 
solution instead of my hacks in the hypervisor. I think we can't access 
those data areas outside the hypervisor (v is a "struct vcpu" equals 
current->domain->vcpu[vcpuid]), and unfortunately as I see Xen forgets 
the fact that the domain was suspended by the time these hypercalls comes.

Windows however seems to be less problematic, I've tested Windows 7 with 
XenServer 6.1 PV drivers, and it worked seamlessly. That driver doesn't 
care about the suspend hypercall return value, it just do a full 
close-open cycle. It worked with the fast/cooperative way, obviously.

>> What do you think of this approach? Since it's based on the
>> existing suspend/resume code it should hopefully work with all
>> guest types without having to update the frontends or hopefully even
>> fix bugs in them (because it looks just like a regular resume which
>> is pretty well tested everywhere). This is particularly important in
>> "cloud" scenarios because the people running clouds have usually
>> little or no control over the software their customers are running.
>> Unfortunately if we have to wait for a PV frontend change to trickle
>> into all the common distros it will be a while before we can fully
>> benefit from driver domain restart. If there is a better way
>> of doing this in the long term involving a frontend change, what
>> do you think about this as a stopgap until the frontends are updated?
>
> I think it could undoubtedly serve well as a stop gap.
>
> Longer term I guess it depends on the shortcomings of this approach
> whether we also want to do something more advanced in the PV drivers
> upstream and have them trickle through. The main downsides I suppose is
> the brief outage due to the proto-suspend plus the requirement to
> reconnect all devices and not just the failed one?
I think the current solution to reuse the suspend/resume is quite 
viable, however it has the mentioned drawbacks, the extra failure points 
of doing the suspend hypercall and reinit all the frontend devices, not 
just the affected ones. In the long term I think we should implement 
this as an extra feature, which could be controlled through xenstore. I 
already has a prototype version for Linux netfront, but it works through 
sysfs. It calls the same suspend resume callbacks, but only for the 
affected devices.

> I expect the outage due to the proto-suspend is dwarfed by the outage
> caused by a backend going away for however long it takes to notice,
> rebuild, reset the hardware, etc etc.
Indeed, probably the backend restoration would take at least 5 seconds. 
Compared to that, the suspend-resume and the frontend device reinit is 
much shorter.
Probably in storage driver domains it's better to suspend the guest 
immediately when the backend is gone, as the guest can easily crash if 
the block device is inaccessible for a long time. In case of network 
access, this isn't such a big problem.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
  2013-01-23 21:58   ` Zoltan Kiss
@ 2013-01-24  9:59     ` Ian Campbell
  2013-01-24 11:45     ` George Shuklin
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Ian Campbell @ 2013-01-24  9:59 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: xen-api, Dave Scott, 'xen-devel@lists.xen.org'

On Wed, 2013-01-23 at 21:58 +0000, Zoltan Kiss wrote:
> Hi,
> 
> On 21/01/13 12:20, Ian Campbell wrote:
> > On Mon, 2013-01-21 at 11:31 +0000, Dave Scott wrote:
> >> Is the current xenstore protocol considered sufficient to
> >> support reconnecting a frontend to a new backend? I did a few
> >> simple experiments with an XCP driver domain prototype a while
> >> back and I failed to make the frontend happy -- usually it would
> >> become confused about the backend and become stuck. This might
> >> just be because I didn't know what I was doing :-)
> >
> > I think the protocol is probably sufficient but the implementations of
> > that protocol are not...
> What kind of problems do you think about?

Just lack of testing of the code paths in that way, my gut feeling is
that there will inevitably be frontends which can't cope, but maybe I'm
pessimistic.

> >> Zoltan (cc:d) also did a few simple experiments to see whether
> >> we could re-use the existing suspend/resume infrastructure,
> >> similar to the 'fast' resume we already use for live checkpoint.
> >> As an experiment he modified libxc's xc_resume.c to allow the
> >> guest's HYPERVISOR_suspend hypercall invocation to return with
> >> '0' (success) rather than '1' (cancelled). The effect of this
> >> was to leave the domain running, but since it thinks it has just
> >> resumed in another domain, it explicitly reconnects its frontends.
> >> With this change and one or two others (like fixing the
> >> start_info->{store_,console.domU}.mfns) he made it work for a
> >> number of oldish guests. I'm sure he can describe the changes
> >> needed more accurately than I can!
> >
> > Would be interesting to know, especially if everything was achieved with
> > toolstack side changes only!
> Actually I've used the xc_domain_resume_any() function from libxc to 
> resume the guests. It worked with PV guests, however with some hacks in 
> the hypervisor to silently discarding the error condicions, and not 
> returning from the hypercall with an error. The two guests I've used, 
> and their problems with the hypercall return values:
> 
> - SLES 11 SP1 (2.6.32.12) crashes because VCPUOP_register_vcpu_info 
> hypercall returns EINVAL, as ( v->arch.vcpu_info_mfn != INVALID_MFN )
> - Debian Squeeze 6.0 (2.6.32-5) crashes because EVTCHNOP_bind_virq 
> returns EEXISTS, as ( v->virq_to_evtchnvirq != 0 )
> - (these hypercalls were made right after guest comes back from the 
> suspend hypercall)

The toolstack might need to do EVTCHNOP_reset or do some other cleanup?

One difference between a cancelled suspend (i.e. resuming in the old
domain) and a normal/successful one is that in the normal case you are
starting in a fresh domain, so things like evtchns are all unbound and
must be redone whereas in the cancelled case some of the old state can
persist and needs to be reset. xend has some code which might form a
useful basis of a list of things which may need resetting, see
tools/python/xen/xend/XendDomainInfo.py resumeDomain.

> I suppose there will be similar problems with other PV guests, I intend 
> to test other ones as well. My current problem is to architect a proper 
> solution instead of my hacks in the hypervisor. I think we can't access 
> those data areas outside the hypervisor (v is a "struct vcpu" equals 
> current->domain->vcpu[vcpuid]), and unfortunately as I see Xen forgets 
> the fact that the domain was suspended by the time these hypercalls comes.

Xen isn't generally aware of things like suspend, it just sees a
domain/vcpu getting torn down and new ones (unrelated as far as Xen
knows) being created.

> Probably in storage driver domains it's better to suspend the guest 
> immediately when the backend is gone, as the guest can easily crash if 
> the block device is inaccessible for a long time. In case of network 
> access, this isn't such a big problem.

Pausing guests when one of their supporting driver domains goes away
does seem like a good idea.

I suppose the flip side is that a domain which isn't using a disk which
goes away briefly would see a hiccup it wouldn't have otherwise seen.

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
  2013-01-23 21:58   ` Zoltan Kiss
  2013-01-24  9:59     ` Ian Campbell
@ 2013-01-24 11:45     ` George Shuklin
       [not found]     ` <51011EF5.9080708@gmail.com>
       [not found]     ` <1359021585.17440.93.camel@zakaz.uk.xensource.com>
  3 siblings, 0 replies; 14+ messages in thread
From: George Shuklin @ 2013-01-24 11:45 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: 'xen-devel@lists.xen.org', Dave Scott, Ian Campbell, xen-api


>> I expect the outage due to the proto-suspend is dwarfed by the outage
>> caused by a backend going away for however long it takes to notice,
>> rebuild, reset the hardware, etc etc.
> Indeed, probably the backend restoration would take at least 5 
> seconds. Compared to that, the suspend-resume and the frontend device 
> reinit is much shorter.
> Probably in storage driver domains it's better to suspend the guest 
> immediately when the backend is gone, as the guest can easily crash if 
> the block device is inaccessible for a long time. In case of network 
> access, this isn't such a big problem.
>
>
Some notes about guest suspend during IO.

I tested that way for storage reboot (pause all domains, reboot ISCSI 
storage and resume every domain). If pause is short (less that 2 
minutes), guest can survive. If pause is longer than 2 minutes, guests 
in state of waiting for io completion, detects IO timeout after 
resuming  and cause IO error on virtual block devices. (PV).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]     ` <51011EF5.9080708@gmail.com>
@ 2013-01-24 12:57       ` Zoltan Kiss
  2013-01-24 13:25       ` Paul Durrant
       [not found]       ` <291EDFCB1E9E224A99088639C4762022013F451DCB22@LONPMAILBOX01.citrite.net>
  2 siblings, 0 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-24 12:57 UTC (permalink / raw)
  To: George Shuklin
  Cc: 'xen-devel@lists.xen.org',
	Dave Scott, Ian Campbell, Paul Durrant, xen-api

On 24/01/13 11:45, George Shuklin wrote:
>
>>> I expect the outage due to the proto-suspend is dwarfed by the outage
>>> caused by a backend going away for however long it takes to notice,
>>> rebuild, reset the hardware, etc etc.
>> Indeed, probably the backend restoration would take at least 5
>> seconds. Compared to that, the suspend-resume and the frontend device
>> reinit is much shorter.
>> Probably in storage driver domains it's better to suspend the guest
>> immediately when the backend is gone, as the guest can easily crash if
>> the block device is inaccessible for a long time. In case of network
>> access, this isn't such a big problem.
>>
>>
> Some notes about guest suspend during IO.
>
> I tested that way for storage reboot (pause all domains, reboot ISCSI
> storage and resume every domain). If pause is short (less that 2
> minutes), guest can survive. If pause is longer than 2 minutes, guests
> in state of waiting for io completion, detects IO timeout after
> resuming  and cause IO error on virtual block devices. (PV).

Good point! I haven't considered that even if the guest is paused, 
during coming back it will still notice that timers expired. I think the 
original idea came from Paul, CCing him to raise awareness about this 
problem.

Zoli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]     ` <51011EF5.9080708@gmail.com>
  2013-01-24 12:57       ` Zoltan Kiss
@ 2013-01-24 13:25       ` Paul Durrant
       [not found]       ` <291EDFCB1E9E224A99088639C4762022013F451DCB22@LONPMAILBOX01.citrite.net>
  2 siblings, 0 replies; 14+ messages in thread
From: Paul Durrant @ 2013-01-24 13:25 UTC (permalink / raw)
  To: George Shuklin, Zoltan Kiss
  Cc: Ian Campbell, xen-api, Dave Scott, 'xen-devel@lists.xen.org'

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of George Shuklin
> Sent: 24 January 2013 11:46
> To: Zoltan Kiss
> Cc: 'xen-devel@lists.xen.org'; Dave Scott; Ian Campbell; xen-
> api@lists.xen.org
> Subject: Re: [Xen-devel] driver domain crash and reconnect handling
> 
> 
> >> I expect the outage due to the proto-suspend is dwarfed by the outage
> >> caused by a backend going away for however long it takes to notice,
> >> rebuild, reset the hardware, etc etc.
> > Indeed, probably the backend restoration would take at least 5
> > seconds. Compared to that, the suspend-resume and the frontend device
> > reinit is much shorter.
> > Probably in storage driver domains it's better to suspend the guest
> > immediately when the backend is gone, as the guest can easily crash if
> > the block device is inaccessible for a long time. In case of network
> > access, this isn't such a big problem.
> >
> >
> Some notes about guest suspend during IO.
> 
> I tested that way for storage reboot (pause all domains, reboot ISCSI storage
> and resume every domain). If pause is short (less that 2 minutes), guest can
> survive. If pause is longer than 2 minutes, guests in state of waiting for io
> completion, detects IO timeout after resuming  and cause IO error on virtual
> block devices. (PV).
> 

To be clear here: do you mean you *paused* and then unpaused the VMs, or *suspended* and then resumed the VMs? I suspect you mean the former.

  Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]       ` <291EDFCB1E9E224A99088639C4762022013F451DCB22@LONPMAILBOX01.citrite.net>
@ 2013-01-24 14:06         ` George Shuklin
       [not found]         ` <51013FED.60201@gmail.com>
  1 sibling, 0 replies; 14+ messages in thread
From: George Shuklin @ 2013-01-24 14:06 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Ian Campbell, xen-api, Dave Scott, Zoltan Kiss,
	'xen-devel@lists.xen.org'

24.01.2013 17:25, Paul Durrant пишет:
>>
>> Some notes about guest suspend during IO.
>>
>> I tested that way for storage reboot (pause all domains, reboot ISCSI storage
>> and resume every domain). If pause is short (less that 2 minutes), guest can
>> survive. If pause is longer than 2 minutes, guests in state of waiting for io
>> completion, detects IO timeout after resuming  and cause IO error on virtual
>> block devices. (PV).
>>
> To be clear here: do you mean you *paused* and then unpaused the VMs, or *suspended* and then resumed the VMs? I suspect you mean the former.
>
>    Paul
Pause, of cause. My bad.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]         ` <51013FED.60201@gmail.com>
@ 2013-01-24 15:01           ` Zoltan Kiss
       [not found]           ` <51014CDC.1030501@citrix.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-24 15:01 UTC (permalink / raw)
  To: George Shuklin
  Cc: xen-api, Ian Campbell, Paul Durrant, Dave Scott,
	'xen-devel@lists.xen.org'

On 24/01/13 14:06, George Shuklin wrote:
> 24.01.2013 17:25, Paul Durrant пишет:
>>>
>>> Some notes about guest suspend during IO.
>>>
>>> I tested that way for storage reboot (pause all domains, reboot ISCSI storage
>>> and resume every domain). If pause is short (less that 2 minutes), guest can
>>> survive. If pause is longer than 2 minutes, guests in state of waiting for io
>>> completion, detects IO timeout after resuming  and cause IO error on virtual
>>> block devices. (PV).
>>>
>> To be clear here: do you mean you *paused* and then unpaused the VMs, or *suspended* and then resumed the VMs? I suspect you mean the former.
>>
>>     Paul
> Pause, of cause. My bad.
>

If you would do a suspend, the frontend driver flush out disk IO 
operations before suspend reached, and therefore there won't be anything 
to timeout after resume. However, if the storage driver domain just 
crashed, I guess the guest would crash at suspend. Maybe we can try out 
something to save the the ring buffer, and replay them back once the 
backend come back (but before resuming the guest). But I'm not sure 
whether the guest would handle the timeouts after the resume first, or 
cancel them if the requests were succesfully responded.

Zoli

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]           ` <51014CDC.1030501@citrix.com>
@ 2013-01-24 15:10             ` Andrew Cooper
  2013-01-24 19:42               ` Zoltan Kiss
  2013-01-24 17:14             ` Ian Campbell
       [not found]             ` <1359047667.32057.31.camel@zakaz.uk.xensource.com>
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2013-01-24 15:10 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Dave Scott, George Shuklin, 'xen-devel@lists.xen.org',
	Paul Durrant, xen-api, Ian Campbell

On 24/01/13 15:01, Zoltan Kiss wrote:
> On 24/01/13 14:06, George Shuklin wrote:
>> 24.01.2013 17:25, Paul Durrant пишет:
>>>> Some notes about guest suspend during IO.
>>>>
>>>> I tested that way for storage reboot (pause all domains, reboot ISCSI storage
>>>> and resume every domain). If pause is short (less that 2 minutes), guest can
>>>> survive. If pause is longer than 2 minutes, guests in state of waiting for io
>>>> completion, detects IO timeout after resuming  and cause IO error on virtual
>>>> block devices. (PV).
>>>>
>>> To be clear here: do you mean you *paused* and then unpaused the VMs, or *suspended* and then resumed the VMs? I suspect you mean the former.
>>>
>>>     Paul
>> Pause, of cause. My bad.
>>
> If you would do a suspend, the frontend driver flush out disk IO 
> operations before suspend reached, and therefore there won't be anything 
> to timeout after resume. However, if the storage driver domain just 
> crashed, I guess the guest would crash at suspend. Maybe we can try out 
> something to save the the ring buffer, and replay them back once the 
> backend come back (but before resuming the guest). But I'm not sure 
> whether the guest would handle the timeouts after the resume first, or 
> cancel them if the requests were succesfully responded.
>
> Zoli

Perhaps I am making this harder, but might it be best to wait for a
short while (15-30 seconds) for the device driver domain to come back,
and if it takes longer than that, pause the VM.

This way, if the driver domain is fast to come back, all the guest
notices is transitorily blocked IO, and if the driver domain is too slow
(but does come back), all the guest might notices is a pause.

Ultimately, if the driver domain never comes back, then we are in a no
worse position than currently.

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]           ` <51014CDC.1030501@citrix.com>
  2013-01-24 15:10             ` Andrew Cooper
@ 2013-01-24 17:14             ` Ian Campbell
       [not found]             ` <1359047667.32057.31.camel@zakaz.uk.xensource.com>
  2 siblings, 0 replies; 14+ messages in thread
From: Ian Campbell @ 2013-01-24 17:14 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: xen-api, Paul Durrant, George Shuklin, Dave Scott,
	'xen-devel@lists.xen.org'

On Thu, 2013-01-24 at 15:01 +0000, Zoltan Kiss wrote:
> If you would do a suspend, the frontend driver flush out disk IO 
> operations before suspend reached, and therefore there won't be anything 
> to timeout after resume.

Actually the behaviour, of Linux blkfront at least, is to not do
anything on suspend but instead to replay the outstanding requests on
the ring on resume. This is to support checkpointing (where you don't
need to replay but I think it is exactly what you want for the driver
domain crash case too.

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]             ` <1359047667.32057.31.camel@zakaz.uk.xensource.com>
@ 2013-01-24 19:39               ` Zoltan Kiss
  0 siblings, 0 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-24 19:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-api, Paul Durrant, George Shuklin, Dave Scott,
	'xen-devel@lists.xen.org'

On 24/01/13 17:14, Ian Campbell wrote:
> On Thu, 2013-01-24 at 15:01 +0000, Zoltan Kiss wrote:
>> If you would do a suspend, the frontend driver flush out disk IO
>> operations before suspend reached, and therefore there won't be anything
>> to timeout after resume.
>
> Actually the behaviour, of Linux blkfront at least, is to not do
> anything on suspend but instead to replay the outstanding requests on
> the ring on resume. This is to support checkpointing (where you don't
> need to replay but I think it is exactly what you want for the driver
> domain crash case too.

That might be true, sorry, I worked from my faulty memories and haven't 
checked it again.

Zoli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
  2013-01-24 15:10             ` Andrew Cooper
@ 2013-01-24 19:42               ` Zoltan Kiss
  0 siblings, 0 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-24 19:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Dave Scott, George Shuklin, 'xen-devel@lists.xen.org',
	Paul Durrant, xen-api, Ian Campbell

On 24/01/13 15:10, Andrew Cooper wrote:
>> If you would do a suspend, the frontend driver flush out disk IO
>> >operations before suspend reached, and therefore there won't be anything
>> >to timeout after resume. However, if the storage driver domain just
>> >crashed, I guess the guest would crash at suspend. Maybe we can try out
>> >something to save the the ring buffer, and replay them back once the
>> >backend come back (but before resuming the guest). But I'm not sure
>> >whether the guest would handle the timeouts after the resume first, or
>> >cancel them if the requests were succesfully responded.
>> >
>> >Zoli
> Perhaps I am making this harder, but might it be best to wait for a
> short while (15-30 seconds) for the device driver domain to come back,
> and if it takes longer than that, pause the VM.
>
> This way, if the driver domain is fast to come back, all the guest
> notices is transitorily blocked IO, and if the driver domain is too slow
> (but does come back), all the guest might notices is a pause.
>
> Ultimately, if the driver domain never comes back, then we are in a no
> worse position than currently.

As Paul mentioned, pausing doesn't cause the guest to reconnect to the 
new backend, so you would need a suspend/resume. But in George's case, 
where the driver domain remains the same, this can work.
However to avoid George's problem with timeouts, a reconnect should be 
necessary. As Ian mentioned, the guest will replay the ring and that 
might help to avoid the timouts to happen.

Zoli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: driver domain crash and reconnect handling
       [not found]     ` <1359021585.17440.93.camel@zakaz.uk.xensource.com>
@ 2013-01-24 19:51       ` Zoltan Kiss
  0 siblings, 0 replies; 14+ messages in thread
From: Zoltan Kiss @ 2013-01-24 19:51 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-api, Dave Scott, 'xen-devel@lists.xen.org'

On 24/01/13 09:59, Ian Campbell wrote:
>> Actually I've used the xc_domain_resume_any() function from libxc to
>> resume the guests. It worked with PV guests, however with some hacks in
>> the hypervisor to silently discarding the error condicions, and not
>> returning from the hypercall with an error. The two guests I've used,
>> and their problems with the hypercall return values:
>>
>> - SLES 11 SP1 (2.6.32.12) crashes because VCPUOP_register_vcpu_info
>> hypercall returns EINVAL, as ( v->arch.vcpu_info_mfn != INVALID_MFN )
>> - Debian Squeeze 6.0 (2.6.32-5) crashes because EVTCHNOP_bind_virq
>> returns EEXISTS, as ( v->virq_to_evtchnvirq != 0 )
>> - (these hypercalls were made right after guest comes back from the
>> suspend hypercall)
>
> The toolstack might need to do EVTCHNOP_reset or do some other cleanup?
Yep, that might be an another solution, to reset these values from 
toolstack via hypercall(s), but as far as I checked all the current 
hypercalls which are changing these things, doing a lot of other stuff 
which we not necessarily want. So it might be necessary to define a new 
hypercall specifically for this use-case. Probably it's easier than make 
Xen aware that a suspend/resume happened, and the guest remained in the 
same domain.

> Pausing guests when one of their supporting driver domains goes away
> does seem like a good idea.
>
> I suppose the flip side is that a domain which isn't using a disk which
> goes away briefly would see a hiccup it wouldn't have otherwise seen.
Well, I think it would be quite complicated to watch the ring buffer for 
activities while there is no backend connected. I would say this is an 
acceptable loss.

Zoli

^ permalink raw reply	[flat|nested] 14+ messages in thread

* driver domain crash and reconnect handling
@ 2013-01-21 11:31 Dave Scott
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Scott @ 2013-01-21 11:31 UTC (permalink / raw)
  To: 'xen-devel@lists.xen.org'; +Cc: Zoltan Kiss, xen-api

Hi,

[ my apologies if this has been discussed before but I couldn't
  find a relevant thread ]

In XCP we're hoping to make serious use of driver domains soon.
We'd like to tell people that their xen-based cloud is even
more robust than before, because even if a host driver crashes,
there is only a slight interruption to guest I/O. For this to
work smoothly, we need to figure out how to re-establish disk
and network I/O after the driver restart -- this is where I'd
appreciate some advice!

Is the current xenstore protocol considered sufficient to
support reconnecting a frontend to a new backend? I did a few
simple experiments with an XCP driver domain prototype a while
back and I failed to make the frontend happy -- usually it would
become confused about the backend and become stuck. This might
just be because I didn't know what I was doing :-)

Zoltan (cc:d) also did a few simple experiments to see whether
we could re-use the existing suspend/resume infrastructure,
similar to the 'fast' resume we already use for live checkpoint.
As an experiment he modified libxc's xc_resume.c to allow the
guest's HYPERVISOR_suspend hypercall invocation to return with
'0' (success) rather than '1' (cancelled). The effect of this
was to leave the domain running, but since it thinks it has just
resumed in another domain, it explicitly reconnects its frontends.
With this change and one or two others (like fixing the
start_info->{store_,console.domU}.mfns) he made it work for a
number of oldish guests. I'm sure he can describe the changes
needed more accurately than I can!

What do you think of this approach? Since it's based on the
existing suspend/resume code it should hopefully work with all
guest types without having to update the frontends or hopefully even
fix bugs in them (because it looks just like a regular resume which
is pretty well tested everywhere). This is particularly important in
"cloud" scenarios because the people running clouds have usually
little or no control over the software their customers are running.
Unfortunately if we have to wait for a PV frontend change to trickle
into all the common distros it will be a while before we can fully
benefit from driver domain restart. If there is a better way
of doing this in the long term involving a frontend change, what
do you think about this as a stopgap until the frontends are updated?

Cheers,
Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-01-24 19:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <81A73678E76EA642801C8F2E4823AD21014183065D27@LONPMAILBOX01.citrite.net>
2013-01-21 12:20 ` driver domain crash and reconnect handling Ian Campbell
     [not found] ` <1358770844.3279.194.camel@zakaz.uk.xensource.com>
2013-01-23 21:58   ` Zoltan Kiss
2013-01-24  9:59     ` Ian Campbell
2013-01-24 11:45     ` George Shuklin
     [not found]     ` <51011EF5.9080708@gmail.com>
2013-01-24 12:57       ` Zoltan Kiss
2013-01-24 13:25       ` Paul Durrant
     [not found]       ` <291EDFCB1E9E224A99088639C4762022013F451DCB22@LONPMAILBOX01.citrite.net>
2013-01-24 14:06         ` George Shuklin
     [not found]         ` <51013FED.60201@gmail.com>
2013-01-24 15:01           ` Zoltan Kiss
     [not found]           ` <51014CDC.1030501@citrix.com>
2013-01-24 15:10             ` Andrew Cooper
2013-01-24 19:42               ` Zoltan Kiss
2013-01-24 17:14             ` Ian Campbell
     [not found]             ` <1359047667.32057.31.camel@zakaz.uk.xensource.com>
2013-01-24 19:39               ` Zoltan Kiss
     [not found]     ` <1359021585.17440.93.camel@zakaz.uk.xensource.com>
2013-01-24 19:51       ` Zoltan Kiss
2013-01-21 11:31 Dave Scott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.