From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Pavel Begunkov <asml.silence@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [git pull] iov_iter fixes
Date: Thu, 9 Sep 2021 15:19:56 -0600 [thread overview]
Message-ID: <5971af96-78b7-8304-3e25-00dc2da3c538@kernel.dk> (raw)
In-Reply-To: <CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com>
On 9/9/21 1:37 PM, Linus Torvalds wrote:
> On Wed, Sep 8, 2021 at 9:24 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Fixes for io-uring handling of iov_iter reexpands
>
> Ugh.
>
> I have pulled this, because I understand what it does and I agree it
> fixes a bug, but it really feels very very hacky and wrong to me.
>
> It really smells like io-uring is doing a "iov_iter_revert()" using a
> number that it pulls incorrectly out of its arse.
>
> So when io-uring does that
>
> iov_iter_revert(iter, io_size - iov_iter_count(iter));
>
> what it *really* wants to do is just basically "iov_iter_reset(iter)".
>
> And that's basically what that addition of that "iov_iter_reexpand()"
> tries to effectively do.
>
> Wouldn't it be better to have a function that does exactly that?
That might indeed be better. Alternatively, consumers that truncate
should expand. Part of the problem here is the inconsistency in how they
are consumed.
> Alternatively (and I'm cc'ing Jens) is is not possible for the
> io-uring code to know how many bytes it *actually* used, rather than
> saying that "ok, the iter originally had X bytes, now it has Y bytes,
> so it must have used X-Y bytes" which was actively wrong for the case
> where something ended up truncating the IO for some reason.
Not sure how we'd do that, outside of stupid tricks like copy the
iov_iter before we pass it down. But that's obviously not going to be
very efficient. Hence we're left with having some way to reset/reexpand,
even in the presence of someone having done truncate on it.
> Because I note that io-uring does that
>
> /* may have left rw->iter inconsistent on -EIOCBQUEUED */
> iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
>
> in io_resubmit_prep() too, and that you guys missed that it's the
> exact same issue, and needs that exact same iov_iter_reexpand().
I think you're right on that one, there's no difference between that use
case and the other two...
> That "req->result" is once again the *original* length, and the above
> code once again mis-handles the case of "oh, the iov got truncated
> because of some IO limit".
>
> So I've pulled this, but I think it is
>
> (a) ugly nasty
>
> (b) incomplete and misses a case
>
> and needs more thought. At the VERY least it needs that
> iov_iter_reexpand() in io_resubmit_prep() too, I think.
>
> I'd like the comments expanded too. In particular that
>
> /* some cases will consume bytes even on error returns */
That comment is from me, and it goes back a few years. IIRC, it was the
iomap or xfs code that I hit this with, but honestly I don't remember
all the details at this point. I can try and play with it and see if it
still reproduces.
> really should expand on the "some cases" thing, and why such an error
> isn't fatal buye should be retried asynchronously blindly like this?
That would certainly make it easier to handle, as we'd never need to
care at that point. Ideally, return 'bytes_consumed' or error. It might
have been a case of -EAGAIN after truncate, I'll have to dig a bit to
find it again. Outside of that error, we don't retry as there's no point
in doing so.
> Because I think _that_ is part of the fundamental issue here - the
> io_uring code tries to just blindly re-submit the whole thing, and it
> does it very badly and actually incorrectly.
>
> Or am I missing something?
I think the key point here is re-figuring out where the
consumption-on-error comes from. If it just ends up being a truncated
iov, that's all good and fine. If not, that feels like a bug somewhere
else that needs fixing.
--
Jens Axboe
next prev parent reply other threads:[~2021-09-09 21:20 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-09 4:22 [git pull] iov_iter fixes Al Viro
2021-09-09 19:37 ` Linus Torvalds
2021-09-09 21:19 ` Jens Axboe [this message]
2021-09-09 21:39 ` Jens Axboe
2021-09-09 21:56 ` Linus Torvalds
2021-09-09 22:21 ` Jens Axboe
2021-09-09 22:56 ` Linus Torvalds
2021-09-10 1:35 ` Jens Axboe
2021-09-10 2:43 ` Jens Axboe
2021-09-10 2:48 ` Al Viro
2021-09-10 3:06 ` Jens Axboe
2021-09-10 3:15 ` Al Viro
2021-09-10 3:23 ` Jens Axboe
2021-09-10 3:24 ` Al Viro
2021-09-10 3:28 ` Jens Axboe
2021-09-13 15:29 ` David Laight
2021-09-09 21:42 ` Dave Chinner
2021-09-10 2:57 ` Al Viro
2021-09-10 3:05 ` Jens Axboe
2021-09-10 3:11 ` Al Viro
2021-09-10 3:22 ` Jens Axboe
2021-09-10 3:27 ` Al Viro
2021-09-10 3:30 ` Jens Axboe
2021-09-10 3:36 ` Al Viro
2021-09-10 13:57 ` Jens Axboe
2021-09-10 14:42 ` Al Viro
2021-09-10 15:08 ` Jens Axboe
2021-09-10 15:32 ` Al Viro
2021-09-10 15:36 ` Jens Axboe
2021-09-10 15:04 ` Jens Axboe
2021-09-10 16:06 ` Jens Axboe
2021-09-10 16:44 ` Linus Torvalds
2021-09-10 16:56 ` Al Viro
2021-09-10 16:58 ` Linus Torvalds
2021-09-10 17:26 ` Jens Axboe
2021-09-10 17:31 ` Linus Torvalds
2021-09-10 17:32 ` Jens Axboe
2021-09-10 18:48 ` Al Viro
2021-09-10 19:04 ` Linus Torvalds
2021-09-10 19:10 ` Linus Torvalds
2021-09-10 19:10 ` Jens Axboe
2021-09-10 17:04 ` Jens Axboe
2021-09-09 22:54 ` Pavel Begunkov
2021-09-09 22:57 ` Pavel Begunkov
2021-09-09 23:14 ` Pavel Begunkov
2021-09-09 20:03 ` pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5971af96-78b7-8304-3e25-00dc2da3c538@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).