All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, Andres Freund <andres@anarazel.de>
Cc: io-uring@vger.kernel.org
Subject: Re: Deduplicate io_*_prep calls?
Date: Mon, 24 Feb 2020 18:44:08 +0300	[thread overview]
Message-ID: <23a49bca-26a6-ddbd-480b-d7f3caa16c29@gmail.com> (raw)
In-Reply-To: <933f2211-d395-fa84-59ae-0b2e725df613@kernel.dk>


[-- Attachment #1.1: Type: text/plain, Size: 4939 bytes --]

On 24/02/2020 18:40, Jens Axboe wrote:
> On 2/24/20 12:12 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
>>> The fast case is not being deferred, that's by far the common (and hot)
>>> case, which means io_issue() is called with sqe != NULL. My worry is
>>> that by moving it into a prep helper, the compiler isn't smart enough to
>>> not make that basically two switches.
>>
>> I'm not sure that benefit of a single switch isn't offset by the lower
>> code density due to the additional per-opcode branches.  Not inlining
>> the prepare function results in:
>>
>> $ size fs/io_uring.o fs/io_uring.before.o
>>    text	   data	    bss	    dec	    hex	filename
>>   75383	   8237	      8	  83628	  146ac	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>>
>> symbol size
>> -io_close_prep 0000000000000066
>> -io_connect_prep 0000000000000051
>> -io_epoll_ctl_prep 0000000000000051
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 0000000000000de9
>> -io_openat2_prep 00000000000000ed
>> -io_openat_prep 0000000000000089
>> -io_poll_add_prep 0000000000000056
>> -io_prep_fsync 0000000000000053
>> -io_prep_sfr 000000000000004e
>> -io_read_prep 00000000000000ca
>> -io_recvmsg_prep 0000000000000079
>> -io_req_defer_prep 000000000000058e
>> +io_req_defer_prep 0000000000000160
>> +io_req_prep 0000000000000d26
>> -io_sendmsg_prep 000000000000006b
>> -io_statx_prep 00000000000000ed
>> -io_write_prep 00000000000000cd
>>
>>
>>
>>> Feel free to prove me wrong, I'd love to reduce it ;-)
>>
>> With a bit of handholding the compiler can deduplicate the switches. It
>> can't recognize on its own that req->opcode can't change between the
>> switch for prep and issue. Can be solved by moving the opcode into a
>> temporary variable. Also needs an inline for io_req_prep (not surpring,
>> it's a bit large).
>>
>> That results in a bit bigger code. That's partially because of more
>> inlining:
>>    text	   data	    bss	    dec	    hex	filename
>>   78291	   8237	      8	  86536	  15208	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>>
>> symbol size
>> +get_order 0000000000000015
>> -io_close_prep 0000000000000066
>> -io_connect_prep 0000000000000051
>> -io_epoll_ctl_prep 0000000000000051
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 00000000000018fa
>> -io_openat2_prep 00000000000000ed
>> -io_openat_prep 0000000000000089
>> -io_poll_add_prep 0000000000000056
>> -io_prep_fsync 0000000000000053
>> -io_prep_sfr 000000000000004e
>> -io_read_prep 00000000000000ca
>> -io_recvmsg_prep 0000000000000079
>> -io_req_defer_prep 000000000000058e
>> +io_req_defer_prep 0000000000000f12
>> -io_sendmsg_prep 000000000000006b
>> -io_statx_prep 00000000000000ed
>> -io_write_prep 00000000000000cd
>>
>>
>> There's still some unnecessary branching on force_nonblocking. The
>> second patch just separates the cases needing force_nonblocking
>> out. Probably not quite the right structure.
>>
>>
>> Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
>> anymore after that. I'm quite doubtful it's a good candidate anyway?
>> Seems mighty complex, and not likely to win much. That's a noticable
>> win:
>>    text	   data	    bss	    dec	    hex	filename
>>   72857	   8141	      8	  81006	  13c6e	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>> --- /tmp/before.txt	2020-02-23 21:00:16.316753022 -0800
>> +++ /tmp/after.txt	2020-02-23 23:10:44.979496728 -0800
>> -io_commit_cqring 00000000000003ef
>> +io_commit_cqring 000000000000012c
>> +io_free_req 000000000000005e
>> -io_free_req 00000000000002ed
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 0000000000000e86
>> -io_poll_remove_one 0000000000000308
>> +io_poll_remove_one 0000000000000074
>> -io_poll_wake 0000000000000498
>> +io_poll_wake 000000000000021c
>> +io_queue_async_work 00000000000002a0
>> -io_queue_sqe 00000000000008cc
>> +io_queue_sqe 0000000000000391
> 
> That's OK, it's slow path, I'd prefer it not to be inlined.
> 
>> Not quite sure what the policy is with attaching POC patches? Also send
>> as separate emails?
> 
> Fine like this, though easier if you inline the patches so it's easier
> to comment on them.
> 
> Agree that the first patch looks fine, though I don't quite see why
> you want to pass in opcode as a separate argument as it's always
> req->opcode. Seeing it separate makes me a bit nervous, thinking that
> someone is reading it again from the sqe, or maybe not passing in
> the right opcode for the given request. So that seems fragile and it
> should go away.

I suppose it's to hint a compiler, that opcode haven't been changed inside the
first switch. And any compiler I used breaks analysis there pretty easy.
Optimising C is such a pain...

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-02-24 15:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24  1:07 Deduplicate io_*_prep calls? Andres Freund
2020-02-24  3:17 ` Jens Axboe
2020-02-24  3:33   ` Andres Freund
2020-02-24  3:52     ` Jens Axboe
2020-02-24  7:12       ` Andres Freund
2020-02-24  9:10         ` Pavel Begunkov
2020-02-24 15:40         ` Jens Axboe
2020-02-24 15:44           ` Pavel Begunkov [this message]
2020-02-24 15:46             ` Jens Axboe
2020-02-24 15:50               ` Pavel Begunkov
2020-02-24 15:53                 ` Jens Axboe
2020-02-24 15:56                   ` Pavel Begunkov
2020-02-24 16:02                     ` Jens Axboe
2020-02-24 16:18                       ` Pavel Begunkov
2020-02-24 17:08                         ` Andres Freund
2020-02-24 17:16                           ` Pavel Begunkov
2020-02-25  9:26                 ` Pavel Begunkov
2020-02-27 21:06                   ` Andres Freund
2020-02-24 16:53           ` Andres Freund
2020-02-24 17:19             ` Jens Axboe
2020-02-24 17:30               ` Jens Axboe
2020-02-24 17:37               ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23a49bca-26a6-ddbd-480b-d7f3caa16c29@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.