linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [git pull] iov_iter fixes
Date: Fri, 10 Sep 2021 09:04:26 -0600	[thread overview]
Message-ID: <4b26d8cd-c3fa-8536-a295-850ecf052ecd@kernel.dk> (raw)
In-Reply-To: <9855f69b-e67e-f7d9-88b8-8941666ab02f@kernel.dk>

On 9/10/21 7:57 AM, Jens Axboe wrote:
> On 9/9/21 9:36 PM, Al Viro wrote:
>> On Thu, Sep 09, 2021 at 09:30:03PM -0600, Jens Axboe wrote:
>>
>>>> Again, we should never, ever modify the iovec (or bvec, etc.) array in
>>>> ->read_iter()/->write_iter()/->sendmsg()/etc. instances.  If you see
>>>> such behaviour anywhere, report it immediately.  Any such is a blatant
>>>> bug.
>>>
>>> Yes that was wrong, the iovec is obviously const. But that really
>>> doesn't change the original point, which was that copying the iov_iter
>>> itself unconditionally would be miserable.
>>
>> Might very well be true, but... won't your patch hit the reimport on
>> every short read?  And the cost of uaccess in there is *much* higher
>> than copying of 48 bytes into local variable...
>>
>> Or am I misreading your patch?  Note that short reads on reaching
>> EOF are obviously normal - it's not a rare case at all.
> 
> It was just a quick hack, might very well be too eager to go through
> those motions. But pondering this instead of sleeping, we don't need to
> copy all of iov_iter in order to restore the state, and we can use the
> same advance after restoring. So something like this may be more
> palatable. Caveat - again untested, and I haven't tested the performance
> impact of this at all.

Passes basic testing for me. I added a sysctl switch for this so I can
compare performance, running my usual peak-perf-single-core benchmark.
That one does ~3.5M IOPS, using polled IO. There's always a slight
variability between boots and builds, hence the sysctl so I could
toggle this behavior on the fly.

Did a few runs, and the differences are very stable. With this enabled,
we spend about 0.15% more time in io_read(). That's only worth about
5K IOPS at 3.5M, not enough to notice as the variation for the 1 second
reporting window usually swings more than that:

Old behavior:
IOPS=3536512, IOS/call=32/31, inflight=(75)
IOPS=3541888, IOS/call=32/32, inflight=(64)
IOPS=3529056, IOS/call=32/31, inflight=(119)
IOPS=3521184, IOS/call=32/32, inflight=(96)
IOPS=3527456, IOS/call=32/31, inflight=(128)
IOPS=3525504, IOS/call=32/32, inflight=(128)
IOPS=3524288, IOS/call=32/32, inflight=(128)
IOPS=3536192, IOS/call=32/32, inflight=(96)
IOPS=3535840, IOS/call=32/32, inflight=(96)
IOPS=3533728, IOS/call=32/31, inflight=(128)
IOPS=3528384, IOS/call=32/32, inflight=(128)
IOPS=3518400, IOS/call=32/32, inflight=(64)

Turning it on:
IOPS=3533824, IOS/call=32/31, inflight=(64)
IOPS=3541408, IOS/call=32/32, inflight=(32)
IOPS=3533024, IOS/call=32/31, inflight=(64)
IOPS=3528672, IOS/call=32/32, inflight=(35)
IOPS=3522272, IOS/call=32/31, inflight=(107)
IOPS=3517632, IOS/call=32/32, inflight=(57)
IOPS=3516000, IOS/call=32/31, inflight=(96)
IOPS=3513568, IOS/call=32/32, inflight=(34)
IOPS=3525600, IOS/call=32/31, inflight=(96)
IOPS=3527136, IOS/call=32/31, inflight=(101)

I think that's tolerable, it was never going to be absolutely free.

What do you think of this approach? Parts of iov_iter are going to
remain constant, like iter_type and data_source. io_uring already copies
iter->count, so that just leaves restoring iov (and unionized friends),
nr_segs union, and iov_offset;

We could pretty this up and have the state part be explicit in iov_iter,
and have the store/restore parts end up in uio.h. That'd tie them closer
together, though I don't expect iov_iter changes to be an issue. It
would make it more maintainable, though. I'll try and hack up this
generic solution, see if that looks any better.

-- 
Jens Axboe


  parent reply	other threads:[~2021-09-10 15:04 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09  4:22 [git pull] iov_iter fixes Al Viro
2021-09-09 19:37 ` Linus Torvalds
2021-09-09 21:19   ` Jens Axboe
2021-09-09 21:39     ` Jens Axboe
2021-09-09 21:56       ` Linus Torvalds
2021-09-09 22:21         ` Jens Axboe
2021-09-09 22:56           ` Linus Torvalds
2021-09-10  1:35             ` Jens Axboe
2021-09-10  2:43               ` Jens Axboe
2021-09-10  2:48               ` Al Viro
2021-09-10  3:06                 ` Jens Axboe
2021-09-10  3:15                   ` Al Viro
2021-09-10  3:23                     ` Jens Axboe
2021-09-10  3:24                     ` Al Viro
2021-09-10  3:28                       ` Jens Axboe
2021-09-13 15:29                 ` David Laight
2021-09-09 21:42     ` Dave Chinner
2021-09-10  2:57     ` Al Viro
2021-09-10  3:05       ` Jens Axboe
2021-09-10  3:11         ` Al Viro
2021-09-10  3:22           ` Jens Axboe
2021-09-10  3:27             ` Al Viro
2021-09-10  3:30               ` Jens Axboe
2021-09-10  3:36                 ` Al Viro
2021-09-10 13:57                   ` Jens Axboe
2021-09-10 14:42                     ` Al Viro
2021-09-10 15:08                       ` Jens Axboe
2021-09-10 15:32                         ` Al Viro
2021-09-10 15:36                           ` Jens Axboe
2021-09-10 15:04                     ` Jens Axboe [this message]
2021-09-10 16:06                       ` Jens Axboe
2021-09-10 16:44                         ` Linus Torvalds
2021-09-10 16:56                         ` Al Viro
2021-09-10 16:58                           ` Linus Torvalds
2021-09-10 17:26                             ` Jens Axboe
2021-09-10 17:31                               ` Linus Torvalds
2021-09-10 17:32                                 ` Jens Axboe
2021-09-10 18:48                                 ` Al Viro
2021-09-10 19:04                                   ` Linus Torvalds
2021-09-10 19:10                                     ` Linus Torvalds
2021-09-10 19:10                                   ` Jens Axboe
2021-09-10 17:04                           ` Jens Axboe
2021-09-09 22:54   ` Pavel Begunkov
2021-09-09 22:57     ` Pavel Begunkov
2021-09-09 23:14   ` Pavel Begunkov
2021-09-09 20:03 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b26d8cd-c3fa-8536-a295-850ecf052ecd@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).