linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Improving AIO cancellation
@ 2013-02-08  3:42 Anatol Pomozov
  2013-02-08  8:32 ` Bart Van Assche
  2013-02-08 17:38 ` Zach Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Anatol Pomozov @ 2013-02-08  3:42 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Theodore Ts'o, koverstreet, linux-aio, linux-kernel

Hi,

At Google we have several applications that heavily use asynchronous
IO. One thing that our userspace developers need is effective AIO
cancellation. You might say "sure use io_cancel syscall". Well, while
it cancels AIO requests it does it ineffectively. Currently (I am
looking at linux-next) io_cancel only marks kiocb as cancelled. The
bios still be issued to device even after kiocb was cancelled. Let's
say you have a congested device and want to cancel some AIO requests -
io_cancel will not make situation better. We would like to see more
resource effective AIO cancellation.

I had a discussion with Ted Tso and Kent Overstreet about improving
this situation and would like to share the ideas with you, linux
community.

Once direct async IO is submitted the request can be at several stages:
1) Sitting in kernel request queue of a congested device
2) Sent to device and sitting in device queue (if NCQ is enabled)
3) Executing on device

Ideally if we can cancel an IO request on any of these stages. But
currently we are especially interested in case #1. I do not know if
cancellation at stage #2 and #3 is possible and/or reasonable.

BTW AIO cancellation makes sense only for direct IO. Buffered AIO will
end up in buffer soon and kiocb will be marked as completed. Later
(maybe much later) writeback will flush those buffers to disk, but you
cannot cancel it..

And yet another thing to remember is md/RAID. Some types of raid
support stripes consistency. When md splits a WRITE across disks
either all or no of the child requests should be completed. If we do
partial write then the disk data will become inconsistent.


Ted and Kent suggested following solution: any time when we do forward
progress with request/bio we need to check its status. If user
cancelled the request then just skip this bio. So it covers case #1.

The draft implementation will look like this. struct bio should have
some way to get current status of kiocb that generated bio. So we add
a pointer to bool flag.

struct bio {
  bool *cancelled;
}

in async DIO codepath this pointer will be initialized with bool at
"struct kiocb"
  bio->cancelled = &kiocb->cancelled;
except md. If it is RAID5 and we perform WRITE request then we do not
initialize this pointer.


when we do forward progress with request/bio we check its cancellation status:
if (bio->cancelled && *bio->cancelled)
  goto do_not_process_bio_because_it_cancelled;

So to cancel kiocb we do
kiocb->cancelled = true;
and all bio created from the request will not be send to device anymore.


The solution seems straightforward, but I would like to hear if there
are other solutions to make AIO cancellation better. Does suggested
implementation looks good? Are there better solutions? What about
cancelling requests that are already sent to device?

If the proposal is fine then I start implementing it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving AIO cancellation
  2013-02-08  3:42 Improving AIO cancellation Anatol Pomozov
@ 2013-02-08  8:32 ` Bart Van Assche
  2013-02-08 17:38 ` Zach Brown
  1 sibling, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2013-02-08  8:32 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: linux-fsdevel, Theodore Ts'o, koverstreet, linux-aio, linux-kernel

On 02/08/13 04:42, Anatol Pomozov wrote:
> Hi,
>
> At Google we have several applications that heavily use asynchronous
> IO. One thing that our userspace developers need is effective AIO
> cancellation. You might say "sure use io_cancel syscall". Well, while
> it cancels AIO requests it does it ineffectively. Currently (I am
> looking at linux-next) io_cancel only marks kiocb as cancelled. The
> bios still be issued to device even after kiocb was cancelled. Let's
> say you have a congested device and want to cancel some AIO requests -
> io_cancel will not make situation better. We would like to see more
> resource effective AIO cancellation.
>
> I had a discussion with Ted Tso and Kent Overstreet about improving
> this situation and would like to share the ideas with you, linux
> community.
>
> Once direct async IO is submitted the request can be at several stages:
> 1) Sitting in kernel request queue of a congested device
> 2) Sent to device and sitting in device queue (if NCQ is enabled)
> 3) Executing on device
>
> Ideally if we can cancel an IO request on any of these stages. But
> currently we are especially interested in case #1. I do not know if
> cancellation at stage #2 and #3 is possible and/or reasonable.
>
> BTW AIO cancellation makes sense only for direct IO. Buffered AIO will
> end up in buffer soon and kiocb will be marked as completed. Later
> (maybe much later) writeback will flush those buffers to disk, but you
> cannot cancel it..
>
> And yet another thing to remember is md/RAID. Some types of raid
> support stripes consistency. When md splits a WRITE across disks
> either all or no of the child requests should be completed. If we do
> partial write then the disk data will become inconsistent.
>
>
> Ted and Kent suggested following solution: any time when we do forward
> progress with request/bio we need to check its status. If user
> cancelled the request then just skip this bio. So it covers case #1.
>
> The draft implementation will look like this. struct bio should have
> some way to get current status of kiocb that generated bio. So we add
> a pointer to bool flag.
>
> struct bio {
>    bool *cancelled;
> }
>
> in async DIO codepath this pointer will be initialized with bool at
> "struct kiocb"
>    bio->cancelled = &kiocb->cancelled;
> except md. If it is RAID5 and we perform WRITE request then we do not
> initialize this pointer.
>
>
> when we do forward progress with request/bio we check its cancellation status:
> if (bio->cancelled && *bio->cancelled)
>    goto do_not_process_bio_because_it_cancelled;
>
> So to cancel kiocb we do
> kiocb->cancelled = true;
> and all bio created from the request will not be send to device anymore.
>
>
> The solution seems straightforward, but I would like to hear if there
> are other solutions to make AIO cancellation better. Does suggested
> implementation looks good? Are there better solutions? What about
> cancelling requests that are already sent to device?
>
> If the proposal is fine then I start implementing it.

Hello Anatol,

Had you already noticed this message: 
http://marc.info/?l=linux-fsdevel&m=136024044202362 ?

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving AIO cancellation
  2013-02-08  3:42 Improving AIO cancellation Anatol Pomozov
  2013-02-08  8:32 ` Bart Van Assche
@ 2013-02-08 17:38 ` Zach Brown
  2013-02-08 21:11   ` Kent Overstreet
  1 sibling, 1 reply; 6+ messages in thread
From: Zach Brown @ 2013-02-08 17:38 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: linux-fsdevel, Theodore Ts'o, koverstreet, linux-aio, linux-kernel

> The draft implementation will look like this. struct bio should have
> some way to get current status of kiocb that generated bio. So we add
> a pointer to bool flag.
> 
> struct bio {
>   bool *cancelled;
> }
> 
> in async DIO codepath this pointer will be initialized with bool at
> "struct kiocb"
>   bio->cancelled = &kiocb->cancelled;

Hmm.  I suppose.  It sure looks disgusting, but the iocb has forgotten
all of the dio state and bios that lead up to the operation once
submission has completed.  It's easy enough to match the lifetime rules
of this reference with end_io's using the iocb.

This method also lets one cancelled iocb flag many submitted bios as
cancelled, so that's nice.

> So to cancel kiocb we do
> kiocb->cancelled = true;
> and all bio created from the request will not be send to device anymore.

(With lots of comments to let us know that it being racey is fine.)

> If the proposal is fine then I start implementing it.

For a teeny hack like this the best proposal is a working prototype
patch.  Don't wait for acks just dive in.

- z

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving AIO cancellation
  2013-02-08 17:38 ` Zach Brown
@ 2013-02-08 21:11   ` Kent Overstreet
  2013-02-08 21:44     ` Zach Brown
  2013-02-12  0:15     ` Anatol Pomozov
  0 siblings, 2 replies; 6+ messages in thread
From: Kent Overstreet @ 2013-02-08 21:11 UTC (permalink / raw)
  To: Zach Brown
  Cc: Anatol Pomozov, linux-fsdevel, Theodore Ts'o, linux-aio,
	linux-kernel

On Fri, Feb 08, 2013 at 09:38:11AM -0800, Zach Brown wrote:
> > The draft implementation will look like this. struct bio should have
> > some way to get current status of kiocb that generated bio. So we add
> > a pointer to bool flag.
> > 
> > struct bio {
> >   bool *cancelled;
> > }
> > 
> > in async DIO codepath this pointer will be initialized with bool at
> > "struct kiocb"
> >   bio->cancelled = &kiocb->cancelled;
> 
> Hmm.  I suppose.  It sure looks disgusting, but the iocb has forgotten
> all of the dio state and bios that lead up to the operation once
> submission has completed.  It's easy enough to match the lifetime rules
> of this reference with end_io's using the iocb.

That's part of it (w.r.t. the exact location of the cancelled flag), but
the idea also isn't for this to be specific to kiocbs or anything like
that.

The problem, as you alluded to, is once a bio passes
generic_make_request() the upper layer has no idea what the block layer
is doing with it. If it's a stacking block device - or even if it's just
getting bounced - the first thing that happens to the bio is it gets
cloned.

So, adding a mechanism for the upper layer to chase down those bios and
whatever other state the lower layers created would be gross. Fuck that.
And, you know my thoughts on callbacks; Ted pointed out some reasons we
are probably going to need cancellation callbacks eventually but I
personally _don't_ want this to get designed around callbacks.

But if we just add this layer of indirection, when a bio is cloned we
can copy the pointer too and cancellation magically Just Works.

For md writes and probably some other things we won't be able to just
blindly copy the pointer as that'd fuck up md's parity stuff, but md
could still check the flag itself if it was worth the trouble.



> 
> This method also lets one cancelled iocb flag many submitted bios as
> cancelled, so that's nice.
> 
> > So to cancel kiocb we do
> > kiocb->cancelled = true;
> > and all bio created from the request will not be send to device anymore.
> 
> (With lots of comments to let us know that it being racey is fine.)

Yeah. We should be really explicit about what this flag means; it
_doesn't_ mean "this bio has been cancelled", it means "please try to
cancel this bio".

We don't know if the bio was succesfully cancelled until the bio is
completed (and this doesn't change anything about how bio completion
works) - if it was cancelled, the bi_end_io callback will get
-ECANCELLED or something.

This is very different from the existing AIO cancellation; KIF_CANCEL
means "this kiocb _has been cancelled_". (sort of, I was just rereading
that code and I'm convinced it's buggy).

> > If the proposal is fine then I start implementing it.
> 
> For a teeny hack like this the best proposal is a working prototype
> patch.  Don't wait for acks just dive in.

Yeah, and I keep claiming I'm going to implement the AIO bits of this
(I keep getting distracted by other things like taking a flamethrower to
the dio code).

Note that the semantics of this doesn't really match up with
io_cancel(). That's not the end of the world, we can paper over that in
the aio code without too much grossness (or at least it'll be localized
grossness).

IMO though, io_cancel() is dumb and we want something new. It's
synchronous - and why anyone ever thought cancellation for
_asynchronous_ io should be synchronous I don't know.

Anyways, if io_cancel() is going to succeed it has to block until the io
has been cancelled and it's got a completion - the completion isn't
returned via io_getevents().

This makes _no sense_, and means an application that is making use of
this probably is going to need a thread pool just to do cancellation.

What we want is a new io_cancel_sane() syscall that doesn't return
anything, and only marks the iocb as "cancellation pending". The
completion would still get returned via io_getevents(), and userspace
would find out if it was cancelled or not then.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving AIO cancellation
  2013-02-08 21:11   ` Kent Overstreet
@ 2013-02-08 21:44     ` Zach Brown
  2013-02-12  0:15     ` Anatol Pomozov
  1 sibling, 0 replies; 6+ messages in thread
From: Zach Brown @ 2013-02-08 21:44 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Anatol Pomozov, linux-fsdevel, Theodore Ts'o, linux-aio,
	linux-kernel

> We don't know if the bio was succesfully cancelled until the bio is
> completed (and this doesn't change anything about how bio completion
> works) - if it was cancelled, the bi_end_io callback will get
> -ECANCELLED or something.

Yeah, I think that's the right design.

> This is very different from the existing AIO cancellation; KIF_CANCEL
> means "this kiocb _has been cancelled_". (sort of, I was just rereading
> that code and I'm convinced it's buggy).

I don't recall the details, but the last time I looked at it I was also
convinced that it was buggy.

> What we want is a new io_cancel_sane() syscall that doesn't return
> anything, and only marks the iocb as "cancellation pending". The
> completion would still get returned via io_getevents(), and userspace
> would find out if it was cancelled or not then.

Agreed.  I wonder if we can mash those semantics in to the existing call
somehow.  This *does* have access to the iocb, so our hands aren't as
tied as we are with io_setup()'s inability to request behavioural
changes.

Maybe we mark the submitted iocbs as ASYNC_BEST_EFFORT_CANCEL and then
io_cancel() on them can return success and completion will proceeed as
normal.  Maybe much more quickly with a cancelled status but maybe not.

- z

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Improving AIO cancellation
  2013-02-08 21:11   ` Kent Overstreet
  2013-02-08 21:44     ` Zach Brown
@ 2013-02-12  0:15     ` Anatol Pomozov
  1 sibling, 0 replies; 6+ messages in thread
From: Anatol Pomozov @ 2013-02-12  0:15 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Zach Brown, linux-fsdevel, Theodore Ts'o, linux-aio, linux-kernel

Hi

On Fri, Feb 8, 2013 at 1:11 PM, Kent Overstreet <koverstreet@google.com> wrote:
> On Fri, Feb 08, 2013 at 09:38:11AM -0800, Zach Brown wrote:
>> > The draft implementation will look like this. struct bio should have
>> > some way to get current status of kiocb that generated bio. So we add
>> > a pointer to bool flag.
>> >
>> > struct bio {
>> >   bool *cancelled;
>> > }
>> >
>> > in async DIO codepath this pointer will be initialized with bool at
>> > "struct kiocb"
>> >   bio->cancelled = &kiocb->cancelled;
>>
>> Hmm.  I suppose.  It sure looks disgusting, but the iocb has forgotten
>> all of the dio state and bios that lead up to the operation once
>> submission has completed.  It's easy enough to match the lifetime rules
>> of this reference with end_io's using the iocb.
>
> That's part of it (w.r.t. the exact location of the cancelled flag), but
> the idea also isn't for this to be specific to kiocbs or anything like
> that.
>
> The problem, as you alluded to, is once a bio passes
> generic_make_request() the upper layer has no idea what the block layer
> is doing with it. If it's a stacking block device - or even if it's just
> getting bounced - the first thing that happens to the bio is it gets
> cloned.
>
> So, adding a mechanism for the upper layer to chase down those bios and
> whatever other state the lower layers created would be gross. Fuck that.
> And, you know my thoughts on callbacks; Ted pointed out some reasons we
> are probably going to need cancellation callbacks eventually but I
> personally _don't_ want this to get designed around callbacks.
>
> But if we just add this layer of indirection, when a bio is cloned we
> can copy the pointer too and cancellation magically Just Works.
>
> For md writes and probably some other things we won't be able to just
> blindly copy the pointer as that'd fuck up md's parity stuff, but md
> could still check the flag itself if it was worth the trouble.
>
>
>
>>
>> This method also lets one cancelled iocb flag many submitted bios as
>> cancelled, so that's nice.
>>
>> > So to cancel kiocb we do
>> > kiocb->cancelled = true;
>> > and all bio created from the request will not be send to device anymore.
>>
>> (With lots of comments to let us know that it being racey is fine.)
>
> Yeah. We should be really explicit about what this flag means; it
> _doesn't_ mean "this bio has been cancelled", it means "please try to
> cancel this bio".
>
> We don't know if the bio was succesfully cancelled until the bio is
> completed (and this doesn't change anything about how bio completion
> works) - if it was cancelled, the bi_end_io callback will get
> -ECANCELLED or something.
>
> This is very different from the existing AIO cancellation; KIF_CANCEL
> means "this kiocb _has been cancelled_". (sort of, I was just rereading
> that code and I'm convinced it's buggy).
>
>> > If the proposal is fine then I start implementing it.
>>
>> For a teeny hack like this the best proposal is a working prototype
>> patch.  Don't wait for acks just dive in.
>
> Yeah, and I keep claiming I'm going to implement the AIO bits of this
> (I keep getting distracted by other things like taking a flamethrower to
> the dio code).
>
> Note that the semantics of this doesn't really match up with
> io_cancel(). That's not the end of the world, we can paper over that in
> the aio code without too much grossness (or at least it'll be localized
> grossness).
>
> IMO though, io_cancel() is dumb and we want something new. It's
> synchronous - and why anyone ever thought cancellation for
> _asynchronous_ io should be synchronous I don't know.

What do mean "cancel is synchronous"? Are you saying that the
cancellation callback (field ki_cancel) is called in io_cancel() and
blocks the syscall? In this case it is a rare situation - currently
cancel callback is used only in drivers/usb/gadget/inode.c. In all
other cases EINVAL is returned. It means that io_cancel always returns
the error result for unfinished async io operation run with
io_submit().

>
> Anyways, if io_cancel() is going to succeed it has to block until the io
> has been cancelled and it's got a completion - the completion isn't
> returned via io_getevents().
>
> This makes _no sense_, and means an application that is making use of
> this probably is going to need a thread pool just to do cancellation.
>
> What we want is a new io_cancel_sane() syscall that doesn't return
> anything, and only marks the iocb as "cancellation pending". The
> completion would still get returned via io_getevents(), and userspace
> would find out if it was cancelled or not then.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-02-12  0:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-08  3:42 Improving AIO cancellation Anatol Pomozov
2013-02-08  8:32 ` Bart Van Assche
2013-02-08 17:38 ` Zach Brown
2013-02-08 21:11   ` Kent Overstreet
2013-02-08 21:44     ` Zach Brown
2013-02-12  0:15     ` Anatol Pomozov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).