* Improving AIO cancellation @ 2013-02-08 3:42 Anatol Pomozov 2013-02-08 8:32 ` Bart Van Assche 2013-02-08 17:38 ` Zach Brown 0 siblings, 2 replies; 6+ messages in thread From: Anatol Pomozov @ 2013-02-08 3:42 UTC (permalink / raw) To: linux-fsdevel; +Cc: Theodore Ts'o, koverstreet, linux-aio, linux-kernel Hi, At Google we have several applications that heavily use asynchronous IO. One thing that our userspace developers need is effective AIO cancellation. You might say "sure use io_cancel syscall". Well, while it cancels AIO requests it does it ineffectively. Currently (I am looking at linux-next) io_cancel only marks kiocb as cancelled. The bios still be issued to device even after kiocb was cancelled. Let's say you have a congested device and want to cancel some AIO requests - io_cancel will not make situation better. We would like to see more resource effective AIO cancellation. I had a discussion with Ted Tso and Kent Overstreet about improving this situation and would like to share the ideas with you, linux community. Once direct async IO is submitted the request can be at several stages: 1) Sitting in kernel request queue of a congested device 2) Sent to device and sitting in device queue (if NCQ is enabled) 3) Executing on device Ideally if we can cancel an IO request on any of these stages. But currently we are especially interested in case #1. I do not know if cancellation at stage #2 and #3 is possible and/or reasonable. BTW AIO cancellation makes sense only for direct IO. Buffered AIO will end up in buffer soon and kiocb will be marked as completed. Later (maybe much later) writeback will flush those buffers to disk, but you cannot cancel it.. And yet another thing to remember is md/RAID. Some types of raid support stripes consistency. When md splits a WRITE across disks either all or no of the child requests should be completed. If we do partial write then the disk data will become inconsistent. Ted and Kent suggested following solution: any time when we do forward progress with request/bio we need to check its status. If user cancelled the request then just skip this bio. So it covers case #1. The draft implementation will look like this. struct bio should have some way to get current status of kiocb that generated bio. So we add a pointer to bool flag. struct bio { bool *cancelled; } in async DIO codepath this pointer will be initialized with bool at "struct kiocb" bio->cancelled = &kiocb->cancelled; except md. If it is RAID5 and we perform WRITE request then we do not initialize this pointer. when we do forward progress with request/bio we check its cancellation status: if (bio->cancelled && *bio->cancelled) goto do_not_process_bio_because_it_cancelled; So to cancel kiocb we do kiocb->cancelled = true; and all bio created from the request will not be send to device anymore. The solution seems straightforward, but I would like to hear if there are other solutions to make AIO cancellation better. Does suggested implementation looks good? Are there better solutions? What about cancelling requests that are already sent to device? If the proposal is fine then I start implementing it. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Improving AIO cancellation 2013-02-08 3:42 Improving AIO cancellation Anatol Pomozov @ 2013-02-08 8:32 ` Bart Van Assche 2013-02-08 17:38 ` Zach Brown 1 sibling, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2013-02-08 8:32 UTC (permalink / raw) To: Anatol Pomozov Cc: linux-fsdevel, Theodore Ts'o, koverstreet, linux-aio, linux-kernel On 02/08/13 04:42, Anatol Pomozov wrote: > Hi, > > At Google we have several applications that heavily use asynchronous > IO. One thing that our userspace developers need is effective AIO > cancellation. You might say "sure use io_cancel syscall". Well, while > it cancels AIO requests it does it ineffectively. Currently (I am > looking at linux-next) io_cancel only marks kiocb as cancelled. The > bios still be issued to device even after kiocb was cancelled. Let's > say you have a congested device and want to cancel some AIO requests - > io_cancel will not make situation better. We would like to see more > resource effective AIO cancellation. > > I had a discussion with Ted Tso and Kent Overstreet about improving > this situation and would like to share the ideas with you, linux > community. > > Once direct async IO is submitted the request can be at several stages: > 1) Sitting in kernel request queue of a congested device > 2) Sent to device and sitting in device queue (if NCQ is enabled) > 3) Executing on device > > Ideally if we can cancel an IO request on any of these stages. But > currently we are especially interested in case #1. I do not know if > cancellation at stage #2 and #3 is possible and/or reasonable. > > BTW AIO cancellation makes sense only for direct IO. Buffered AIO will > end up in buffer soon and kiocb will be marked as completed. Later > (maybe much later) writeback will flush those buffers to disk, but you > cannot cancel it.. > > And yet another thing to remember is md/RAID. Some types of raid > support stripes consistency. When md splits a WRITE across disks > either all or no of the child requests should be completed. If we do > partial write then the disk data will become inconsistent. > > > Ted and Kent suggested following solution: any time when we do forward > progress with request/bio we need to check its status. If user > cancelled the request then just skip this bio. So it covers case #1. > > The draft implementation will look like this. struct bio should have > some way to get current status of kiocb that generated bio. So we add > a pointer to bool flag. > > struct bio { > bool *cancelled; > } > > in async DIO codepath this pointer will be initialized with bool at > "struct kiocb" > bio->cancelled = &kiocb->cancelled; > except md. If it is RAID5 and we perform WRITE request then we do not > initialize this pointer. > > > when we do forward progress with request/bio we check its cancellation status: > if (bio->cancelled && *bio->cancelled) > goto do_not_process_bio_because_it_cancelled; > > So to cancel kiocb we do > kiocb->cancelled = true; > and all bio created from the request will not be send to device anymore. > > > The solution seems straightforward, but I would like to hear if there > are other solutions to make AIO cancellation better. Does suggested > implementation looks good? Are there better solutions? What about > cancelling requests that are already sent to device? > > If the proposal is fine then I start implementing it. Hello Anatol, Had you already noticed this message: http://marc.info/?l=linux-fsdevel&m=136024044202362 ? Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Improving AIO cancellation 2013-02-08 3:42 Improving AIO cancellation Anatol Pomozov 2013-02-08 8:32 ` Bart Van Assche @ 2013-02-08 17:38 ` Zach Brown 2013-02-08 21:11 ` Kent Overstreet 1 sibling, 1 reply; 6+ messages in thread From: Zach Brown @ 2013-02-08 17:38 UTC (permalink / raw) To: Anatol Pomozov Cc: linux-fsdevel, Theodore Ts'o, koverstreet, linux-aio, linux-kernel > The draft implementation will look like this. struct bio should have > some way to get current status of kiocb that generated bio. So we add > a pointer to bool flag. > > struct bio { > bool *cancelled; > } > > in async DIO codepath this pointer will be initialized with bool at > "struct kiocb" > bio->cancelled = &kiocb->cancelled; Hmm. I suppose. It sure looks disgusting, but the iocb has forgotten all of the dio state and bios that lead up to the operation once submission has completed. It's easy enough to match the lifetime rules of this reference with end_io's using the iocb. This method also lets one cancelled iocb flag many submitted bios as cancelled, so that's nice. > So to cancel kiocb we do > kiocb->cancelled = true; > and all bio created from the request will not be send to device anymore. (With lots of comments to let us know that it being racey is fine.) > If the proposal is fine then I start implementing it. For a teeny hack like this the best proposal is a working prototype patch. Don't wait for acks just dive in. - z ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Improving AIO cancellation 2013-02-08 17:38 ` Zach Brown @ 2013-02-08 21:11 ` Kent Overstreet 2013-02-08 21:44 ` Zach Brown 2013-02-12 0:15 ` Anatol Pomozov 0 siblings, 2 replies; 6+ messages in thread From: Kent Overstreet @ 2013-02-08 21:11 UTC (permalink / raw) To: Zach Brown Cc: Anatol Pomozov, linux-fsdevel, Theodore Ts'o, linux-aio, linux-kernel On Fri, Feb 08, 2013 at 09:38:11AM -0800, Zach Brown wrote: > > The draft implementation will look like this. struct bio should have > > some way to get current status of kiocb that generated bio. So we add > > a pointer to bool flag. > > > > struct bio { > > bool *cancelled; > > } > > > > in async DIO codepath this pointer will be initialized with bool at > > "struct kiocb" > > bio->cancelled = &kiocb->cancelled; > > Hmm. I suppose. It sure looks disgusting, but the iocb has forgotten > all of the dio state and bios that lead up to the operation once > submission has completed. It's easy enough to match the lifetime rules > of this reference with end_io's using the iocb. That's part of it (w.r.t. the exact location of the cancelled flag), but the idea also isn't for this to be specific to kiocbs or anything like that. The problem, as you alluded to, is once a bio passes generic_make_request() the upper layer has no idea what the block layer is doing with it. If it's a stacking block device - or even if it's just getting bounced - the first thing that happens to the bio is it gets cloned. So, adding a mechanism for the upper layer to chase down those bios and whatever other state the lower layers created would be gross. Fuck that. And, you know my thoughts on callbacks; Ted pointed out some reasons we are probably going to need cancellation callbacks eventually but I personally _don't_ want this to get designed around callbacks. But if we just add this layer of indirection, when a bio is cloned we can copy the pointer too and cancellation magically Just Works. For md writes and probably some other things we won't be able to just blindly copy the pointer as that'd fuck up md's parity stuff, but md could still check the flag itself if it was worth the trouble. > > This method also lets one cancelled iocb flag many submitted bios as > cancelled, so that's nice. > > > So to cancel kiocb we do > > kiocb->cancelled = true; > > and all bio created from the request will not be send to device anymore. > > (With lots of comments to let us know that it being racey is fine.) Yeah. We should be really explicit about what this flag means; it _doesn't_ mean "this bio has been cancelled", it means "please try to cancel this bio". We don't know if the bio was succesfully cancelled until the bio is completed (and this doesn't change anything about how bio completion works) - if it was cancelled, the bi_end_io callback will get -ECANCELLED or something. This is very different from the existing AIO cancellation; KIF_CANCEL means "this kiocb _has been cancelled_". (sort of, I was just rereading that code and I'm convinced it's buggy). > > If the proposal is fine then I start implementing it. > > For a teeny hack like this the best proposal is a working prototype > patch. Don't wait for acks just dive in. Yeah, and I keep claiming I'm going to implement the AIO bits of this (I keep getting distracted by other things like taking a flamethrower to the dio code). Note that the semantics of this doesn't really match up with io_cancel(). That's not the end of the world, we can paper over that in the aio code without too much grossness (or at least it'll be localized grossness). IMO though, io_cancel() is dumb and we want something new. It's synchronous - and why anyone ever thought cancellation for _asynchronous_ io should be synchronous I don't know. Anyways, if io_cancel() is going to succeed it has to block until the io has been cancelled and it's got a completion - the completion isn't returned via io_getevents(). This makes _no sense_, and means an application that is making use of this probably is going to need a thread pool just to do cancellation. What we want is a new io_cancel_sane() syscall that doesn't return anything, and only marks the iocb as "cancellation pending". The completion would still get returned via io_getevents(), and userspace would find out if it was cancelled or not then. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Improving AIO cancellation 2013-02-08 21:11 ` Kent Overstreet @ 2013-02-08 21:44 ` Zach Brown 2013-02-12 0:15 ` Anatol Pomozov 1 sibling, 0 replies; 6+ messages in thread From: Zach Brown @ 2013-02-08 21:44 UTC (permalink / raw) To: Kent Overstreet Cc: Anatol Pomozov, linux-fsdevel, Theodore Ts'o, linux-aio, linux-kernel > We don't know if the bio was succesfully cancelled until the bio is > completed (and this doesn't change anything about how bio completion > works) - if it was cancelled, the bi_end_io callback will get > -ECANCELLED or something. Yeah, I think that's the right design. > This is very different from the existing AIO cancellation; KIF_CANCEL > means "this kiocb _has been cancelled_". (sort of, I was just rereading > that code and I'm convinced it's buggy). I don't recall the details, but the last time I looked at it I was also convinced that it was buggy. > What we want is a new io_cancel_sane() syscall that doesn't return > anything, and only marks the iocb as "cancellation pending". The > completion would still get returned via io_getevents(), and userspace > would find out if it was cancelled or not then. Agreed. I wonder if we can mash those semantics in to the existing call somehow. This *does* have access to the iocb, so our hands aren't as tied as we are with io_setup()'s inability to request behavioural changes. Maybe we mark the submitted iocbs as ASYNC_BEST_EFFORT_CANCEL and then io_cancel() on them can return success and completion will proceeed as normal. Maybe much more quickly with a cancelled status but maybe not. - z ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Improving AIO cancellation 2013-02-08 21:11 ` Kent Overstreet 2013-02-08 21:44 ` Zach Brown @ 2013-02-12 0:15 ` Anatol Pomozov 1 sibling, 0 replies; 6+ messages in thread From: Anatol Pomozov @ 2013-02-12 0:15 UTC (permalink / raw) To: Kent Overstreet Cc: Zach Brown, linux-fsdevel, Theodore Ts'o, linux-aio, linux-kernel Hi On Fri, Feb 8, 2013 at 1:11 PM, Kent Overstreet <koverstreet@google.com> wrote: > On Fri, Feb 08, 2013 at 09:38:11AM -0800, Zach Brown wrote: >> > The draft implementation will look like this. struct bio should have >> > some way to get current status of kiocb that generated bio. So we add >> > a pointer to bool flag. >> > >> > struct bio { >> > bool *cancelled; >> > } >> > >> > in async DIO codepath this pointer will be initialized with bool at >> > "struct kiocb" >> > bio->cancelled = &kiocb->cancelled; >> >> Hmm. I suppose. It sure looks disgusting, but the iocb has forgotten >> all of the dio state and bios that lead up to the operation once >> submission has completed. It's easy enough to match the lifetime rules >> of this reference with end_io's using the iocb. > > That's part of it (w.r.t. the exact location of the cancelled flag), but > the idea also isn't for this to be specific to kiocbs or anything like > that. > > The problem, as you alluded to, is once a bio passes > generic_make_request() the upper layer has no idea what the block layer > is doing with it. If it's a stacking block device - or even if it's just > getting bounced - the first thing that happens to the bio is it gets > cloned. > > So, adding a mechanism for the upper layer to chase down those bios and > whatever other state the lower layers created would be gross. Fuck that. > And, you know my thoughts on callbacks; Ted pointed out some reasons we > are probably going to need cancellation callbacks eventually but I > personally _don't_ want this to get designed around callbacks. > > But if we just add this layer of indirection, when a bio is cloned we > can copy the pointer too and cancellation magically Just Works. > > For md writes and probably some other things we won't be able to just > blindly copy the pointer as that'd fuck up md's parity stuff, but md > could still check the flag itself if it was worth the trouble. > > > >> >> This method also lets one cancelled iocb flag many submitted bios as >> cancelled, so that's nice. >> >> > So to cancel kiocb we do >> > kiocb->cancelled = true; >> > and all bio created from the request will not be send to device anymore. >> >> (With lots of comments to let us know that it being racey is fine.) > > Yeah. We should be really explicit about what this flag means; it > _doesn't_ mean "this bio has been cancelled", it means "please try to > cancel this bio". > > We don't know if the bio was succesfully cancelled until the bio is > completed (and this doesn't change anything about how bio completion > works) - if it was cancelled, the bi_end_io callback will get > -ECANCELLED or something. > > This is very different from the existing AIO cancellation; KIF_CANCEL > means "this kiocb _has been cancelled_". (sort of, I was just rereading > that code and I'm convinced it's buggy). > >> > If the proposal is fine then I start implementing it. >> >> For a teeny hack like this the best proposal is a working prototype >> patch. Don't wait for acks just dive in. > > Yeah, and I keep claiming I'm going to implement the AIO bits of this > (I keep getting distracted by other things like taking a flamethrower to > the dio code). > > Note that the semantics of this doesn't really match up with > io_cancel(). That's not the end of the world, we can paper over that in > the aio code without too much grossness (or at least it'll be localized > grossness). > > IMO though, io_cancel() is dumb and we want something new. It's > synchronous - and why anyone ever thought cancellation for > _asynchronous_ io should be synchronous I don't know. What do mean "cancel is synchronous"? Are you saying that the cancellation callback (field ki_cancel) is called in io_cancel() and blocks the syscall? In this case it is a rare situation - currently cancel callback is used only in drivers/usb/gadget/inode.c. In all other cases EINVAL is returned. It means that io_cancel always returns the error result for unfinished async io operation run with io_submit(). > > Anyways, if io_cancel() is going to succeed it has to block until the io > has been cancelled and it's got a completion - the completion isn't > returned via io_getevents(). > > This makes _no sense_, and means an application that is making use of > this probably is going to need a thread pool just to do cancellation. > > What we want is a new io_cancel_sane() syscall that doesn't return > anything, and only marks the iocb as "cancellation pending". The > completion would still get returned via io_getevents(), and userspace > would find out if it was cancelled or not then. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-02-12 0:15 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-02-08 3:42 Improving AIO cancellation Anatol Pomozov 2013-02-08 8:32 ` Bart Van Assche 2013-02-08 17:38 ` Zach Brown 2013-02-08 21:11 ` Kent Overstreet 2013-02-08 21:44 ` Zach Brown 2013-02-12 0:15 ` Anatol Pomozov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).