All of lore.kernel.org
 help / color / mirror / Atom feed
* Cancelling asynchronous operations in libxl
@ 2015-01-20 13:50 Dave Scott
  2015-01-20 16:38 ` Ian Jackson
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Scott @ 2015-01-20 13:50 UTC (permalink / raw)
  To: Xen-devel; +Cc: Ian Jackson, Euan Harris, Ian Campbell

Hi,

Firstly, sorry for the extreme lateness of this reply!

I’ve re-read the thread from Nov 2013: (2013!)

http://lists.xen.org/archives/html/xen-devel/2013-11/msg01176.html

and found it quite thought-provoking.

From the Xapi/Xenopsd point of view, the main feature that we’d like is to be able to ‘unstick’ the system when it appears stuck. When the user gets bored and hits the big red “cancel” button we’d like the particular operation/thread/call to unblock (in a timely fashion, it’s probably ok if this takes 30s?) and for the system to be left in some kind of manageable state. I think it’s ok for Xapi/Xenopsd to destroy any half-built VMs via fresh libxl calls afterwards, so libxl doesn’t need to tidy everything itself automatically.

I think cancellation could be quite hard to test. One thing we could do is add a counter and increment it every time we pass a point where cancellation is possible. In some libxl debug mode we could configure it to simulate a cancellation event when the counter reaches a specific value. A test harness could then try to walk through all the different cancellation possibilities and check the system is in some sensible state afterwards.

We were thinking about running some number of libxl-based stateless worker processes which would also allow us to kill them with various signals if we really needed to. I guess in the event that libxl cancel didn’t work for whatever reason, we could fall back to this rather cruder approach (although this should be only in extreme circumstances).

Anyway, sorry again for the lateness!

Cheers,
Dave


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-01-20 13:50 Cancelling asynchronous operations in libxl Dave Scott
@ 2015-01-20 16:38 ` Ian Jackson
  2015-01-28 16:13   ` Euan Harris
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2015-01-20 16:38 UTC (permalink / raw)
  To: Dave Scott; +Cc: Euan Harris, Ian Campbell, Xen-devel

Dave Scott writes ("Cancelling asynchronous operations in libxl"):
> I’ve re-read the thread from Nov 2013: (2013!)
> http://lists.xen.org/archives/html/xen-devel/2013-11/msg01176.html
> and found it quite thought-provoking.

Thanks.

However, I think the message you really want is

  From: Ian Jackson <ian.jackson@eu.citrix.com>
  To: <xen-devel@lists.xensource.com>
  CC: Ian Campbell <ian.campbell@citrix.com>
  Subject: [RFC PATCH 00/14] libxl: Asynchronous event cancellation
  Date: Fri, 20 Dec 2013 18:45:38 +0000

and the subsequent thread, which I can't find in the
lists.xenproject.org archives but I did find for example here:

  http://osdir.com/ml/xen-development/2013-12/msg00472.html

Getting the patch series out of the archive there will be a PITA so
I have pushed it to my repo on xenbits:

  git://xenbits.xen.org/people/iwj/xen.git
  base.ao-cancel.v1-2013-12..wip.ao-cancel.v1-2013-12

What I need to know, really, is:

 * Is an API along these lines going to meet your needs ?

 * Can you help me test it ?  Trying to test this in xl is going to be
   awkward and involve a lot of extraneous and very complicated signal
   handling; and AFAIAA libvirt doesn't have any cancellation
   facility.

   So if your libxl callers can exercise this cancellation
   functionality then that would be much easier.

 * Any further comments (eg, re timescales etc).


> From the Xapi/Xenopsd point of view, the main feature that we’d like
> is to be able to ‘unstick’ the system when it appears stuck. When
> the user gets bored and hits the big red “cancel” button we’d like
> the particular operation/thread/call to unblock (in a timely
> fashion, it’s probably ok if this takes 30s?) and for the system to
> be left in some kind of manageable state. I think it’s ok for
> Xapi/Xenopsd to destroy any half-built VMs via fresh libxl calls
> afterwards, so libxl doesn’t need to tidy everything itself
> automatically.

This is roughly what the cancellation system is supposed to do.

> I think cancellation could be quite hard to test. One thing we could
> do is add a counter and increment it every time we pass a point
> where cancellation is possible. In some libxl debug mode we could
> configure it to simulate a cancellation event when the counter
> reaches a specific value. A test harness could then try to walk
> through all the different cancellation possibilities and check the
> system is in some sensible state afterwards.

I think it might be possible to add something like that to my
cancellation proposal.

> We were thinking about running some number of libxl-based stateless
> worker processes which would also allow us to kill them with various
> signals if we really needed to. I guess in the event that libxl
> cancel didn’t work for whatever reason, we could fall back to this
> rather cruder approach (although this should be only in extreme
> circumstances).

Just killing a process executing a libxl operation is likely to leave
the system in an `ugly' state.  libxl ought still to be able to deal
with it, in principle, but I wouldn't be surprised to find bugs
lurking in this kind of area.  This ought to be a last resort.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-01-20 16:38 ` Ian Jackson
@ 2015-01-28 16:13   ` Euan Harris
  2015-01-28 16:57     ` Ian Jackson
  0 siblings, 1 reply; 10+ messages in thread
From: Euan Harris @ 2015-01-28 16:13 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Dave Scott, Ian Campbell, Xen-devel

Hi,

On Tue, Jan 20, 2015 at 04:38:24PM +0000, Ian Jackson wrote:
>  * Is an API along these lines going to meet your needs ?

The API you propose for libxl_ao_cancel, as described in the comment in
libxl.h, looks reasonable to us.    The comment for ERROR_NOTIMPLEMENTED
is a bit confusing: under what circumstances might a task actually be
cancelled although libxl_ao_cancel returned ERROR_NOTIMPLEMENTED?

>  * Can you help me test it ?  Trying to test this in xl is going to be
>    awkward and involve a lot of extraneous and very complicated signal
>    handling; and AFAIAA libvirt doesn't have any cancellation
>    facility.

Yes, of course.   However, wouldn't it also be useful for xl to gain
the ability to cancel long-running operations by handling SIGINT?

>  * Any further comments (eg, re timescales etc).

None that we can think of at the moment.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-01-28 16:13   ` Euan Harris
@ 2015-01-28 16:57     ` Ian Jackson
  2015-02-02 17:43       ` Ian Jackson
  2015-06-24 15:33       ` Euan Harris
  0 siblings, 2 replies; 10+ messages in thread
From: Ian Jackson @ 2015-01-28 16:57 UTC (permalink / raw)
  To: Euan Harris; +Cc: Dave Scott, Ian Campbell, Xen-devel

Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> On Tue, Jan 20, 2015 at 04:38:24PM +0000, Ian Jackson wrote:
> >  * Is an API along these lines going to meet your needs ?
> 
> The API you propose for libxl_ao_cancel, as described in the comment in
> libxl.h, looks reasonable to us.    The comment for ERROR_NOTIMPLEMENTED
> is a bit confusing: under what circumstances might a task actually be
> cancelled although libxl_ao_cancel returned ERROR_NOTIMPLEMENTED?

A single operation may go through phases during which cancellation is
effective, and phases during which it is not very effective because it
hasn't been properly hooked up.  If libxl_ao_cancel is called during
the latter, it will return ERROR_NOTIMPLEMENTED but the operation will
still be marked as wanting-cancellation, so if it enters a phase where
cancellation is effective, it will stop at that point.

To put it another way, what libxl_ao_cancel does is:
  - find the ao in question, hopefully
  - make a note in the ao that it ought to be cancelled
  - look for something internal that has registered a
     cancellation hook
  - if such a hook was found, call it and return success;
     otherwise return ERROR_NOTIMPLEMENTED.

So ERROR_NOTIMPLEMENTED is more of a hint.

If you prefer, it would be possible to make libxl_ao_cancel _not_ make
a note that the operation ought to be cancelled, in the case where
it's returning ERROR_NOTIMPLEMENTED.  Then the libxl_ao_cancel would
be guaranteed to have no effect.

But, if we do that, it won't be possible to mark a
currently-running-and-not-promptly-cancellable but
maybe-shortly-actually-cancellable operation as to be cancelled.

Perhaps if this is confusing the better answer is simply to return a
different error code instead of ERROR_NOTIMPLEMENTED,
  ERROR_CANCELLATION_DIFFICULT

> >  * Can you help me test it ?  Trying to test this in xl is going to be
> >    awkward and involve a lot of extraneous and very complicated signal
> >    handling; and AFAIAA libvirt doesn't have any cancellation
> >    facility.
> 
> Yes, of course.   However, wouldn't it also be useful for xl to gain
> the ability to cancel long-running operations by handling SIGINT?

As I say, making xl do something with signals is a substantial piece
of work in itself.

> >  * Any further comments (eg, re timescales etc).
> 
> None that we can think of at the moment.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-01-28 16:57     ` Ian Jackson
@ 2015-02-02 17:43       ` Ian Jackson
  2015-02-03  9:59         ` Euan Harris
  2015-06-24 15:33       ` Euan Harris
  1 sibling, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2015-02-02 17:43 UTC (permalink / raw)
  To: Euan Harris, Dave Scott, Xen-devel, Ian Campbell

Ian Jackson writes ("Re: Cancelling asynchronous operations in libxl"):
> Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> > The API you propose for libxl_ao_cancel, as described in the comment in
> > libxl.h, looks reasonable to us.    The comment for ERROR_NOTIMPLEMENTED
> > is a bit confusing: under what circumstances might a task actually be
> > cancelled although libxl_ao_cancel returned ERROR_NOTIMPLEMENTED?
> 
> A single operation may go through phases during which cancellation is
> effective, and phases during which it is not very effective because it
> hasn't been properly hooked up.  [etc.]

Does that explanation answer your questions ?  What did you think of
my alternative suggestions ?

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-02-02 17:43       ` Ian Jackson
@ 2015-02-03  9:59         ` Euan Harris
  2015-02-03 12:04           ` Ian Jackson
  0 siblings, 1 reply; 10+ messages in thread
From: Euan Harris @ 2015-02-03  9:59 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Dave Scott, Ian Campbell, Xen-devel

Hi,

On Mon, Feb 02, 2015 at 05:43:58PM +0000, Ian Jackson wrote:
> Ian Jackson writes ("Re: Cancelling asynchronous operations in libxl"):
> > Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> > > The API you propose for libxl_ao_cancel, as described in the comment in
> > > libxl.h, looks reasonable to us.    The comment for ERROR_NOTIMPLEMENTED
> > > is a bit confusing: under what circumstances might a task actually be
> > > cancelled although libxl_ao_cancel returned ERROR_NOTIMPLEMENTED?
> > 
> > A single operation may go through phases during which cancellation is
> > effective, and phases during which it is not very effective because it
> > hasn't been properly hooked up.  [etc.]
> 
> Does that explanation answer your questions ?  What did you think of
> my alternative suggestions ?

Sorry, I didn't think you were waiting for a reply.   Your explanation
does answer my questions, thanks.

I think that the current proposed behaviour will suit us fine.   We will
probably treat the OK and NOTIMPLEMENTED cases in the same way, by using
more drastic means to stop the activity if cancellation is not confirmed
within a reasonable timeout.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-02-03  9:59         ` Euan Harris
@ 2015-02-03 12:04           ` Ian Jackson
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Jackson @ 2015-02-03 12:04 UTC (permalink / raw)
  To: Euan Harris; +Cc: Dave Scott, Ian Campbell, Xen-devel

Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> Sorry, I didn't think you were waiting for a reply.   Your explanation
> does answer my questions, thanks.

Oh, good, thanks.

> I think that the current proposed behaviour will suit us fine.   We will
> probably treat the OK and NOTIMPLEMENTED cases in the same way, by using
> more drastic means to stop the activity if cancellation is not confirmed
> within a reasonable timeout.

Right.  I will rebase my series and get it to compile and do some
smoke tests.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-01-28 16:57     ` Ian Jackson
  2015-02-02 17:43       ` Ian Jackson
@ 2015-06-24 15:33       ` Euan Harris
  2015-06-24 15:41         ` Ian Jackson
  1 sibling, 1 reply; 10+ messages in thread
From: Euan Harris @ 2015-06-24 15:33 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Dave Scott, Ian Campbell, Xen-devel

Hi,

On Wed, Jan 28, 2015 at 04:57:19PM +0000, Ian Jackson wrote:
> Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> > On Tue, Jan 20, 2015 at 04:38:24PM +0000, Ian Jackson wrote:
> > >  * Is an API along these lines going to meet your needs ?
> > 
> > The API you propose for libxl_ao_cancel, as described in the comment in
> > libxl.h, looks reasonable to us.    The comment for ERROR_NOTIMPLEMENTED
> > is a bit confusing: under what circumstances might a task actually be
> > cancelled although libxl_ao_cancel returned ERROR_NOTIMPLEMENTED?
> 
> A single operation may go through phases during which cancellation is
> effective, and phases during which it is not very effective because it
> hasn't been properly hooked up.  If libxl_ao_cancel is called during
> the latter, it will return ERROR_NOTIMPLEMENTED but the operation will
> still be marked as wanting-cancellation, so if it enters a phase where
> cancellation is effective, it will stop at that point.
> 
> To put it another way, what libxl_ao_cancel does is:
>   - find the ao in question, hopefully
>   - make a note in the ao that it ought to be cancelled
>   - look for something internal that has registered a
>      cancellation hook
>   - if such a hook was found, call it and return success;
>      otherwise return ERROR_NOTIMPLEMENTED.
> 
> So ERROR_NOTIMPLEMENTED is more of a hint.
> 
> If you prefer, it would be possible to make libxl_ao_cancel _not_ make
> a note that the operation ought to be cancelled, in the case where
> it's returning ERROR_NOTIMPLEMENTED.  Then the libxl_ao_cancel would
> be guaranteed to have no effect.
> 
> But, if we do that, it won't be possible to mark a
> currently-running-and-not-promptly-cancellable but
> maybe-shortly-actually-cancellable operation as to be cancelled.
> 
> Perhaps if this is confusing the better answer is simply to return a
> different error code instead of ERROR_NOTIMPLEMENTED,
>   ERROR_CANCELLATION_DIFFICULT

We've discussed the semantics of cancellation a bit more off-list and
have come to two conclusions:

  1.  The behaviour of the current libxl_ao_cancel() proposal is more akin
      to 'abort' than 'cancel'.   This is because the proposed
      implementation can't guarantee the state of the domain after
      cancellation - it might be fine, it might be dead, or it might
      be in some unanticipated limbo state, depending on just when the
      cancellation call took effect.

      We should rename the proposed libxl_ao_cancel() to libxl_ao_abort().
      This function will be defined as a best-effort way to kill an
      asynchronous operation, and will give no guarantees about the
      state of the affected domain afterwards.   We may add a true
      libxl_ao_cancel() function later, with better guarantees about the
      state of the domain afterwards.   libxl_ao_abort(), as defined here,
      covers many of our requirements in Xapi.

  2.  We should remove the ERROR_NOTIMPLEMENTED error code.    It does
      not add much value, because cancellation is implemented in terms
      of underlying primitive operations, rather than API operations.
      Any async API operation may be cancellable in principle, and whether
      this error code is returned depends on exactly what primitive
      operation happens to be in progress when libxl_ao_cancel/_abort() is
      called.   Furthermore, even if the call to libxl_ao_cancel/_abort()
      returns NOTIMPLEMENTED, the operation may be cancelled anyway when
      it starts a cancellable primitive operation.

      The semantics of libxl_ao_cancel/_abort() are defined as best effort,
      so it suffices to have just two return codes:

        0: The request to cancel/abort has been noted, and it may or may 
           not happen.   To find out which, check the eventual return code
           of the async operation.

        ERROR_NOTFOUND: the operation to be cancelled has already completed.

Thanks,
Euan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-06-24 15:33       ` Euan Harris
@ 2015-06-24 15:41         ` Ian Jackson
  2015-06-25 10:40           ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Jackson @ 2015-06-24 15:41 UTC (permalink / raw)
  To: Euan Harris
  Cc: Dave Scott, Stefano Stabellini, Wei Liu, Ian Campbell, Xen-devel

Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> We've discussed the semantics of cancellation a bit more off-list and
> have come to two conclusions:
> 
>   1.  [...]
> 
>       We should rename the proposed libxl_ao_cancel() to libxl_ao_abort().

Unless someone objects to this, I will do this in my next
rebase/resend.

(CCing a slightly wider set of people who may be interested in libxl
API semantics.)

>       This function will be defined as a best-effort way to kill an
>       asynchronous operation, and will give no guarantees about the
>       state of the affected domain afterwards.   We may add a true
>       libxl_ao_cancel() function later, with better guarantees about the
>       state of the domain afterwards.   libxl_ao_abort(), as defined here,
>       covers many of our requirements in Xapi.

My plan for implementing (eventually) libxl_ao_cancel is that
it works my like abort, except that operations can:

 * block and unblock cancellation during critical sections

 * declare an ao "committed", causing cancellation requests to all fail

 * divert cancellation requests to a special handler (which could
   start to try to undo the operation, for example)

...
>       The semantics of libxl_ao_cancel/_abort() are defined as best effort,
>       so it suffices to have just two return codes:
> 
>         0: The request to cancel/abort has been noted, and it may or may 
>            not happen.   To find out which, check the eventual return code
>            of the async operation.
> 
>         ERROR_NOTFOUND: the operation to be cancelled has already completed.

ERROR_NOTFOUND might also mean that the operation has not yet
started.  For example, the call to libxl_domain_create_new might be on
its way into libxl and be waiting for the libxl ctx lock.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cancelling asynchronous operations in libxl
  2015-06-24 15:41         ` Ian Jackson
@ 2015-06-25 10:40           ` Ian Campbell
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Campbell @ 2015-06-25 10:40 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Wei Liu, Dave Scott, Stefano Stabellini, Euan Harris, Xen-devel

On Wed, 2015-06-24 at 16:41 +0100, Ian Jackson wrote:
> Euan Harris writes ("Re: Cancelling asynchronous operations in libxl"):
> > We've discussed the semantics of cancellation a bit more off-list and
> > have come to two conclusions:
> > 
> >   1.  [...]
> > 
> >       We should rename the proposed libxl_ao_cancel() to libxl_ao_abort().
> 
> Unless someone objects to this, I will do this in my next
> rebase/resend.
> 
> (CCing a slightly wider set of people who may be interested in libxl
> API semantics.)

FWIW it seems fine to me...

> 
> >       This function will be defined as a best-effort way to kill an
> >       asynchronous operation, and will give no guarantees about the
> >       state of the affected domain afterwards.   We may add a true
> >       libxl_ao_cancel() function later, with better guarantees about the
> >       state of the domain afterwards.   libxl_ao_abort(), as defined here,
> >       covers many of our requirements in Xapi.
> 
> My plan for implementing (eventually) libxl_ao_cancel is that
> it works my like abort, except that operations can:
> 
>  * block and unblock cancellation during critical sections
> 
>  * declare an ao "committed", causing cancellation requests to all fail
> 
>  * divert cancellation requests to a special handler (which could
>    start to try to undo the operation, for example)
> 
> ...
> >       The semantics of libxl_ao_cancel/_abort() are defined as best effort,
> >       so it suffices to have just two return codes:
> > 
> >         0: The request to cancel/abort has been noted, and it may or may 
> >            not happen.   To find out which, check the eventual return code
> >            of the async operation.
> > 
> >         ERROR_NOTFOUND: the operation to be cancelled has already completed.
> 
> ERROR_NOTFOUND might also mean that the operation has not yet
> started.  For example, the call to libxl_domain_create_new might be on
> its way into libxl and be waiting for the libxl ctx lock.
> 
> Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-25 10:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 13:50 Cancelling asynchronous operations in libxl Dave Scott
2015-01-20 16:38 ` Ian Jackson
2015-01-28 16:13   ` Euan Harris
2015-01-28 16:57     ` Ian Jackson
2015-02-02 17:43       ` Ian Jackson
2015-02-03  9:59         ` Euan Harris
2015-02-03 12:04           ` Ian Jackson
2015-06-24 15:33       ` Euan Harris
2015-06-24 15:41         ` Ian Jackson
2015-06-25 10:40           ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.