linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [git pull] iov_iter fixes
Date: Thu, 9 Sep 2021 23:54:58 +0100	[thread overview]
Message-ID: <84c85780-fe43-e95b-312d-b7671c65a7aa@gmail.com> (raw)
In-Reply-To: <CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com>

On 9/9/21 8:37 PM, Linus Torvalds wrote:
> On Wed, Sep 8, 2021 at 9:24 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>>         Fixes for io-uring handling of iov_iter reexpands
> 
> Ugh.
> 
> I have pulled this, because I understand what it does and I agree it
> fixes a bug, but it really feels very very hacky and wrong to me.

Maybe was worded not too clearly, my apologies.


> It really smells like io-uring is doing a "iov_iter_revert()" using a
> number that it pulls incorrectly out of its arse.

It's not invented by io_uring,

filemap.c : generic_file_direct_[write,read]()

do the same thing. Also, the block layer was not re-expanding before
~5.12, so it looks it was possible to trigger a similar thing without
io_uring, but I haven't tried to reproduce. Was mentioned in the
cover-letter.

> So when io-uring does that
> 
>                 iov_iter_revert(iter, io_size - iov_iter_count(iter));
> 
> what it *really* wants to do is just basically "iov_iter_reset(iter)".
> 
> And that's basically what that addition of that "iov_iter_reexpand()"
> tries to effectively do.
> 
> Wouldn't it be better to have a function that does exactly that?
> 
> Alternatively (and I'm cc'ing Jens) is is not possible for the
> io-uring code to know how many bytes it *actually* used, rather than
> saying that "ok, the iter originally had X bytes, now it has Y bytes,
> so it must have used X-Y bytes" which was actively wrong for the case
> where something ended up truncating the IO for some reason.
> 
> Because I note that io-uring does that
> 
>         /* may have left rw->iter inconsistent on -EIOCBQUEUED */
>         iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
> 
> in io_resubmit_prep() too, and that you guys missed that it's the
> exact same issue, and needs that exact same iov_iter_reexpand().

Right. It was covered by v1-v2, which were failing requests with
additional fallback in v2 [1], but I dropped in v3 [2] because there
is a difference. Namely io_resubmit_prep() might be called deeply down
the stack, e.g. in the block layer.

It was intended to get fixed once the first part is merged, and I do
believe that was the right approach, because there were certain
communication delays. The first version was posted a month ago, but
we missed the merged window. It appeared to me that if we get anything
more complex 

 



do that at the bottom of stack.




 how deep in the stack we do that. It was indended to
be 


[1] https://lkml.org/lkml/2021/8/12/620
[2] https://lkml.org/lkml/2021/8/23/285

> 
> That "req->result" is once again the *original* length, and the above
> code once again mis-handles the case of "oh, the iov got truncated
> because of some IO limit".
> 
> So I've pulled this, but I think it is
> 
>  (a) ugly nasty
> 
>  (b) incomplete and misses a case
> 
> and needs more thought. At the VERY least it needs that
> iov_iter_reexpand() in io_resubmit_prep() too, I think.
> 
> I'd like the comments expanded too. In particular that
> 
>                 /* some cases will consume bytes even on error returns */
> 
> really should expand on the "some cases" thing, and why such an error
> isn't fatal buye should be retried asynchronously blindly like this?
> 
> Because I think _that_ is part of the fundamental issue here - the
> io_uring code tries to just blindly re-submit the whole thing, and it
> does it very badly and actually incorrectly.
> 
> Or am I missing something?
> 
>            Linus
> 

-- 
Pavel Begunkov

  parent reply	other threads:[~2021-09-09 22:55 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09  4:22 [git pull] iov_iter fixes Al Viro
2021-09-09 19:37 ` Linus Torvalds
2021-09-09 21:19   ` Jens Axboe
2021-09-09 21:39     ` Jens Axboe
2021-09-09 21:56       ` Linus Torvalds
2021-09-09 22:21         ` Jens Axboe
2021-09-09 22:56           ` Linus Torvalds
2021-09-10  1:35             ` Jens Axboe
2021-09-10  2:43               ` Jens Axboe
2021-09-10  2:48               ` Al Viro
2021-09-10  3:06                 ` Jens Axboe
2021-09-10  3:15                   ` Al Viro
2021-09-10  3:23                     ` Jens Axboe
2021-09-10  3:24                     ` Al Viro
2021-09-10  3:28                       ` Jens Axboe
2021-09-13 15:29                 ` David Laight
2021-09-09 21:42     ` Dave Chinner
2021-09-10  2:57     ` Al Viro
2021-09-10  3:05       ` Jens Axboe
2021-09-10  3:11         ` Al Viro
2021-09-10  3:22           ` Jens Axboe
2021-09-10  3:27             ` Al Viro
2021-09-10  3:30               ` Jens Axboe
2021-09-10  3:36                 ` Al Viro
2021-09-10 13:57                   ` Jens Axboe
2021-09-10 14:42                     ` Al Viro
2021-09-10 15:08                       ` Jens Axboe
2021-09-10 15:32                         ` Al Viro
2021-09-10 15:36                           ` Jens Axboe
2021-09-10 15:04                     ` Jens Axboe
2021-09-10 16:06                       ` Jens Axboe
2021-09-10 16:44                         ` Linus Torvalds
2021-09-10 16:56                         ` Al Viro
2021-09-10 16:58                           ` Linus Torvalds
2021-09-10 17:26                             ` Jens Axboe
2021-09-10 17:31                               ` Linus Torvalds
2021-09-10 17:32                                 ` Jens Axboe
2021-09-10 18:48                                 ` Al Viro
2021-09-10 19:04                                   ` Linus Torvalds
2021-09-10 19:10                                     ` Linus Torvalds
2021-09-10 19:10                                   ` Jens Axboe
2021-09-10 17:04                           ` Jens Axboe
2021-09-09 22:54   ` Pavel Begunkov [this message]
2021-09-09 22:57     ` Pavel Begunkov
2021-09-09 23:14   ` Pavel Begunkov
2021-09-09 20:03 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84c85780-fe43-e95b-312d-b7671c65a7aa@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).