linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
	Olivier Langlois <olivier@trillion01.com>,
	io-uring@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: I/O cancellation in io-uring (was: io_uring: clear TIF_NOTIFY_SIGNAL ...)
Date: Tue, 10 Aug 2021 22:40:58 -0700	[thread overview]
Message-ID: <29430B77-37C2-4E65-B279-2590CF825FB3@gmail.com> (raw)
In-Reply-To: <1bf56100-e904-65b5-bbb8-fa313d85b01a@kernel.dk>


> On Aug 10, 2021, at 7:51 PM, Jens Axboe <axboe@kernel.dk> wrote:
> 
> There's no way to cancel file/bdev related IO, and there likely never
> will be. That's basically the only exception, everything else can get
> canceled pretty easily. Many things can be written on why that is the
> case, and they have (myself included), but it boils down to proper
> hardware support which we'll likely never have as it's not a well tested
> path. For other kind of async IO, we're either waiting in poll (which is
> trivially cancellable) or in an async thread (which is also easily
> cancellable). For DMA+irq driven block storage, we'd need to be able to
> reliably cancel on the hardware side, to prevent errant DMA after the
> fact.
> 
> None of this is really io_uring specific, io_uring just suffers from the
> same limitations as others would (or are).
> 
>> Otherwise they might potentially never complete, as happens in my
>> use-case.
> 
> If you see that, that is most certainly a bug. While bdev/reg file IO
> can't really be canceled, they all have the property that they complete
> in finite time. Either the IO completes normally in a "short" amount of
> time, or a timeout will cancel it and complete it in error. There are no
> unbounded execution times for uncancellable IO.

I understand your point that hardware reads/writes cannot easily be cancelled.
(well, the buffers can be unmapped from the IOMMU tables, but let's put this
discussion aside).

Yet the question is whether reads/writes from special files such as pipes,
eventfd, signalfd, fuse should be cancellable. Obviously, it is always
possible to issue a blocking read/write from a worker thread. Yet, there are
inherent overheads that are associated with this design, specifically
context-switches. While the overhead of a context-switch is not as high as
portrayed by some, it is still high for low latency use-cases.

There is a potential alternative, however. When a write to a pipe is
performed, or when an event takes place or signal sent, queued io-uring reads
can be fulfilled immediately, without a context-switch to a worker. I
specifically want to fulfill userfaultfd reads and notify userspace on
page-faults in such manner. I do not have the numbers in front of me, but
doing so shows considerable performance improvement.

To allow such use-cases, cancellation of the read/write is needed. A read from
a pipe might never complete if there is no further writes to the pipe.
Cancellation is not hard to implement for such cases (it's only the mess with
the existing AIO's ki_cancel() that bothers me, but as you said - it is a
single use-case).

Admittedly, there are no such use-cases in the kernel today, but arguably,
this is due to the lack of infrastructure. I see no alternative which is as
performant as the one I propose here. Using poll+read or any other option
will have unwarranted overhead.  

If an RFC might convince you, or some more mainstream use-case such as queued
pipe reads would convince you, I can work on such in order to try to get
something like that.


  reply	other threads:[~2021-08-11  5:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08  0:13 [PATCH 0/2] io_uring: bug fixes Nadav Amit
2021-08-08  0:13 ` [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work Nadav Amit
2021-08-08 12:55   ` Pavel Begunkov
2021-08-08 17:31     ` Nadav Amit
2021-08-09  4:07       ` Hao Xu
2021-08-09  4:50         ` Nadav Amit
2021-08-09 10:35           ` Pavel Begunkov
2021-08-09 10:18       ` Pavel Begunkov
2021-08-09 21:48   ` Olivier Langlois
2021-08-10  8:28     ` Nadav Amit
2021-08-10 13:33       ` Olivier Langlois
2021-08-10 21:32       ` Pavel Begunkov
2021-08-11  2:33         ` Nadav Amit
2021-08-11  2:51           ` Jens Axboe
2021-08-11  5:40             ` Nadav Amit [this message]
2021-08-08  0:13 ` [PATCH 2/2] io_uring: Use WRITE_ONCE() when writing to sq_flags Nadav Amit
2021-08-09 13:53 ` [PATCH 0/2] io_uring: bug fixes Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29430B77-37C2-4E65-B279-2590CF825FB3@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olivier@trillion01.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).