linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, io-uring@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: [PATCH 04/15] io_uring: re-issue block requests that failed because of resources
Date: Fri, 19 Jun 2020 08:36:41 -0600	[thread overview]
Message-ID: <e7f492a3-e180-ccf6-fcf2-2cd7f318f152@kernel.dk> (raw)
In-Reply-To: <105a78f0-407f-09e3-5951-7f76756762b2@gmail.com>

On 6/19/20 8:30 AM, Pavel Begunkov wrote:
> On 19/06/2020 17:22, Jens Axboe wrote:
>> On 6/19/20 8:12 AM, Pavel Begunkov wrote:
>>> On 18/06/2020 17:43, Jens Axboe wrote:
>>>> Mark the plug with nowait == true, which will cause requests to avoid
>>>> blocking on request allocation. If they do, we catch them and reissue
>>>> them from a task_work based handler.
>>>>
>>>> Normally we can catch -EAGAIN directly, but the hard case is for split
>>>> requests. As an example, the application issues a 512KB request. The
>>>> block core will split this into 128KB if that's the max size for the
>>>> device. The first request issues just fine, but we run into -EAGAIN for
>>>> some latter splits for the same request. As the bio is split, we don't
>>>> get to see the -EAGAIN until one of the actual reads complete, and hence
>>>> we cannot handle it inline as part of submission.
>>>>
>>>> This does potentially cause re-reads of parts of the range, as the whole
>>>> request is reissued. There's currently no better way to handle this.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> ---
>>>>  fs/io_uring.c | 148 ++++++++++++++++++++++++++++++++++++++++++--------
>>>>  1 file changed, 124 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 2e257c5a1866..40413fb9d07b 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -900,6 +900,13 @@ static int io_file_get(struct io_submit_state *state, struct io_kiocb *req,
>>>>  static void __io_queue_sqe(struct io_kiocb *req,
>>>>  			   const struct io_uring_sqe *sqe);
>>>>  
>>> ...> +
>>>> +static void io_rw_resubmit(struct callback_head *cb)
>>>> +{
>>>> +	struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work);
>>>> +	struct io_ring_ctx *ctx = req->ctx;
>>>> +	int err;
>>>> +
>>>> +	__set_current_state(TASK_RUNNING);
>>>> +
>>>> +	err = io_sq_thread_acquire_mm(ctx, req);
>>>> +
>>>> +	if (io_resubmit_prep(req, err)) {
>>>> +		refcount_inc(&req->refs);
>>>> +		io_queue_async_work(req);
>>>> +	}
>>>
>>> Hmm, I have similar stuff but for iopoll. On top removing grab_env* for
>>> linked reqs and some extra. I think I'll rebase on top of this.
>>
>> Yes, there's certainly overlap there. I consider this series basically
>> wrapped up, so feel free to just base on top of it.
>>
>>>> +static bool io_rw_reissue(struct io_kiocb *req, long res)
>>>> +{
>>>> +#ifdef CONFIG_BLOCK
>>>> +	struct task_struct *tsk;
>>>> +	int ret;
>>>> +
>>>> +	if ((res != -EAGAIN && res != -EOPNOTSUPP) || io_wq_current_is_worker())
>>>> +		return false;
>>>> +
>>>> +	tsk = req->task;
>>>> +	init_task_work(&req->task_work, io_rw_resubmit);
>>>> +	ret = task_work_add(tsk, &req->task_work, true);
>>>
>>> I don't like that the request becomes un-discoverable for cancellation
>>> awhile sitting in the task_work list. Poll stuff at least have hash_node
>>> for that.
>>
>> Async buffered IO was never cancelable, so it doesn't really matter.
>> It's tied to the task, so we know it'll get executed - either run, or
>> canceled if the task is going away. This is really not that different
>> from having the work discoverable through io-wq queueing before, since
>> the latter could never be canceled anyway as it sits there
>> uninterruptibly waiting for IO completion.
> 
> Makes sense. I was thinking about using this task-requeue for all kinds
> of requests. Though, instead of speculating it'd be better for me to embody
> ideas into patches and see.

And that's fine, for requests where it matters, on-the-side
discoverability can still be a thing. If we're in the task itself where
it is queued, that provides us safey from the work going way from under
us. Then we just have to mark it appropriately, if it needs to get
canceled instead of run to completion.

Some care needed, of course, but there's nothing that would prevent this
from working. Ideally we'd be able to peal off a task_work entry, but
that's kind of difficult with the singly linked non-locked list.

-- 
Jens Axboe



  reply	other threads:[~2020-06-19 14:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 14:43 Jens Axboe
2020-06-18 14:43 ` [PATCH 01/15] block: provide plug based way of signaling forced no-wait semantics Jens Axboe
2020-06-18 14:43 ` [PATCH 02/15] io_uring: always plug for any number of IOs Jens Axboe
2020-06-18 14:43 ` [PATCH 03/15] io_uring: catch -EIO from buffered issue request failure Jens Axboe
2020-06-18 14:43 ` [PATCH 04/15] io_uring: re-issue block requests that failed because of resources Jens Axboe
2020-06-19 14:12   ` Pavel Begunkov
2020-06-19 14:22     ` Jens Axboe
2020-06-19 14:30       ` Pavel Begunkov
2020-06-19 14:36         ` Jens Axboe [this message]
2020-06-18 14:43 ` [PATCH 05/15] mm: allow read-ahead with IOCB_NOWAIT set Jens Axboe
2020-06-24  1:02   ` Dave Chinner
2020-06-24  1:46     ` Matthew Wilcox
2020-06-24 15:00       ` Jens Axboe
2020-06-24 15:35         ` Jens Axboe
2020-06-24 16:41           ` Matthew Wilcox
2020-06-24 16:44             ` Jens Axboe
2020-07-07 11:38               ` Andreas Grünbacher
2020-07-07 14:31                 ` Jens Axboe
2020-08-10 22:56               ` Dave Chinner
2020-08-10 23:03                 ` Jens Axboe
2020-06-24  4:38   ` Dave Chinner
2020-06-24 15:01     ` Jens Axboe
2020-06-18 14:43 ` [PATCH 06/15] mm: abstract out wake_page_match() from wake_page_function() Jens Axboe
2020-06-18 14:43 ` [PATCH 07/15] mm: add support for async page locking Jens Axboe
2020-07-07 11:32   ` Andreas Grünbacher
2020-07-07 14:32     ` Jens Axboe
2020-06-18 14:43 ` [PATCH 08/15] mm: support async buffered reads in generic_file_buffered_read() Jens Axboe
2020-06-18 14:43 ` [PATCH 09/15] fs: add FMODE_BUF_RASYNC Jens Axboe
2020-06-18 14:43 ` [PATCH 10/15] block: flag block devices as supporting IOCB_WAITQ Jens Axboe
2020-06-18 14:43 ` [PATCH 11/15] xfs: flag files as supporting buffered async reads Jens Axboe
2020-06-18 14:43 ` [PATCH 12/15] btrfs: " Jens Axboe
2020-06-19 11:11   ` David Sterba
2020-06-18 14:43 ` [PATCH 13/15] ext4: flag " Jens Axboe
2020-06-18 14:43 ` [PATCH 14/15] mm: add kiocb_wait_page_queue_init() helper Jens Axboe
2020-06-18 14:43 ` [PATCH 15/15] io_uring: support true async buffered reads, if file provides it Jens Axboe
2020-06-23 12:39   ` Pavel Begunkov
2020-06-23 14:38     ` Jens Axboe
2020-06-18 14:45 ` [PATCHSET v7 0/12] Add support for async buffered reads Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7f492a3-e180-ccf6-fcf2-2cd7f318f152@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).