From: Jens Axboe <axboe@kernel.dk>
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] block: Optimize bio_init()
Date: Sat, 11 Sep 2021 16:16:38 -0600 [thread overview]
Message-ID: <fe7f7cc7-2403-7ec6-7c1c-abb6ac6a68fa@kernel.dk> (raw)
In-Reply-To: <c810ce05-0893-d8c8-f288-0e018b0a08ca@kernel.dk>
On 9/11/21 4:09 PM, Jens Axboe wrote:
> On 9/11/21 4:01 PM, Jens Axboe wrote:
>> On 9/11/21 3:47 PM, Bart Van Assche wrote:
>>> The following test:
>>>
>>> sudo taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 /dev/nullb0
>>>
>>> reports 1366 K IOPS on my test setup without this patch and 1380 K IOPS
>>> with this patch applied. In other words, this patch realizes a 1%
>>> performance improvement. I think this is because this patch makes the
>>> compiler generate better code. See also commit da521626ac62 ("bio:
>>> optimize initialization of a bio").
>>>
>>> The assembler code generated by gcc without this patch is as follows:
>>>
>>> 0x0000000000000000 <+0>: call 0x5 <bio_init+5>
>>> 0x0000000000000005 <+5>: xor %eax,%eax
>>> 0x0000000000000007 <+7>: xor %ecx,%ecx
>>> 0x0000000000000009 <+9>: movl $0x1,0x1c(%rdi)
>>> 0x0000000000000010 <+16>: movq $0x0,(%rdi)
>>> 0x0000000000000017 <+23>: movq $0x0,0x8(%rdi)
>>> 0x000000000000001f <+31>: movq $0x0,0x10(%rdi)
>>> 0x0000000000000027 <+39>: mov %ax,0x18(%rdi)
>>> 0x000000000000002b <+43>: movb $0x0,0x1a(%rdi)
>>> 0x000000000000002f <+47>: movq $0x0,0x20(%rdi)
>>> 0x0000000000000037 <+55>: movq $0x0,0x28(%rdi)
>>> 0x000000000000003f <+63>: movl $0x0,0x30(%rdi)
>>> 0x0000000000000046 <+70>: movq $0x0,0x38(%rdi)
>>> 0x000000000000004e <+78>: movq $0x0,0x40(%rdi)
>>> 0x0000000000000056 <+86>: movq $0x0,0x48(%rdi)
>>> 0x000000000000005e <+94>: movq $0x0,0x50(%rdi)
>>> 0x0000000000000066 <+102>: movq $0x0,0x58(%rdi)
>>> 0x000000000000006e <+110>: movq $0x0,0x60(%rdi)
>>> 0x0000000000000076 <+118>: mov %cx,0x68(%rdi)
>>> 0x000000000000007a <+122>: movl $0x1,0x6c(%rdi)
>>> 0x0000000000000081 <+129>: mov %dx,0x6a(%rdi)
>>> 0x0000000000000085 <+133>: mov %rsi,0x70(%rdi)
>>> 0x0000000000000089 <+137>: movq $0x0,0x78(%rdi)
>>> 0x0000000000000091 <+145>: ret
>>>
>>> With this patch bio_init() is compiled into the following assembly code:
>>>
>>> 0x0000000000000000 <+0>: call 0x5 <bio_init+5>
>>> 0x0000000000000005 <+5>: mov %rdi,%r8
>>> 0x0000000000000008 <+8>: mov $0x10,%ecx
>>> 0x000000000000000d <+13>: xor %eax,%eax
>>> 0x000000000000000f <+15>: rep stos %rax,%es:(%rdi)
>>> 0x0000000000000012 <+18>: movl $0x1,0x1c(%r8)
>>> 0x000000000000001a <+26>: mov %dx,0x6a(%r8)
>>> 0x000000000000001f <+31>: movl $0x1,0x6c(%r8)
>>> 0x0000000000000027 <+39>: mov %rsi,0x70(%r8)
>>> 0x000000000000002b <+43>: ret
>>>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>> block/bio.c | 45 ++++++++-------------------------------------
>>> 1 file changed, 8 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/block/bio.c b/block/bio.c
>>> index 5df3dd282e40..775cd4274523 100644
>>> --- a/block/bio.c
>>> +++ b/block/bio.c
>>> @@ -244,47 +244,18 @@ static void bio_free(struct bio *bio)
>>> }
>>>
>>> /*
>>> - * Users of this function have their own bio allocation. Subsequently,
>>> - * they must remember to pair any call to bio_init() with bio_uninit()
>>> - * when IO has completed, or when the bio is released.
>>> + * Users of this function must pair any call to bio_init() with a call to
>>> + * bio_uninit() after IO has completed or when the bio is released.
>>> */
>>> void bio_init(struct bio *bio, struct bio_vec *table,
>>> unsigned short max_vecs)
>>> {
>>> - bio->bi_next = NULL;
>>> - bio->bi_bdev = NULL;
>>> - bio->bi_opf = 0;
>>> - bio->bi_flags = 0;
>>> - bio->bi_ioprio = 0;
>>> - bio->bi_write_hint = 0;
>>> - bio->bi_status = 0;
>>> - bio->bi_iter.bi_sector = 0;
>>> - bio->bi_iter.bi_size = 0;
>>> - bio->bi_iter.bi_idx = 0;
>>> - bio->bi_iter.bi_bvec_done = 0;
>>> - bio->bi_end_io = NULL;
>>> - bio->bi_private = NULL;
>>> -#ifdef CONFIG_BLK_CGROUP
>>> - bio->bi_blkg = NULL;
>>> - bio->bi_issue.value = 0;
>>> -#ifdef CONFIG_BLK_CGROUP_IOCOST
>>> - bio->bi_iocost_cost = 0;
>>> -#endif
>>> -#endif
>>> -#ifdef CONFIG_BLK_INLINE_ENCRYPTION
>>> - bio->bi_crypt_context = NULL;
>>> -#endif
>>> -#ifdef CONFIG_BLK_DEV_INTEGRITY
>>> - bio->bi_integrity = NULL;
>>> -#endif
>>> - bio->bi_vcnt = 0;
>>> -
>>> - atomic_set(&bio->__bi_remaining, 1);
>>> - atomic_set(&bio->__bi_cnt, 1);
>>> -
>>> - bio->bi_max_vecs = max_vecs;
>>> - bio->bi_io_vec = table;
>>> - bio->bi_pool = NULL;
>>> + *bio = (struct bio) {
>>> + .__bi_remaining = ATOMIC_INIT(1),
>>> + .__bi_cnt = ATOMIC_INIT(1),
>>> + .bi_max_vecs = max_vecs,
>>> + .bi_io_vec = table,
>>> + };
>>> }
>>
>> I'll give this a whirl too, another upside is that it's less prone to
>> errors if struct bio is changed.
>
> Seems slower for me, by about 1-1.5%, which is consumed by
> bio_alloc_kiocb() which is the only bio_init() caller in my test. Using
> gcc 11.1 here, and my code generation seems to match your case too
> (series of mov vs rep stos with the patch).
>
> Probably a CPU thing. I'm running on an AMD 3970X for these tests.
Looking at profile:
43.34 │ rep stos %rax,%es:(%rdi)
I do wonder if rep stos is just not very well suited for small regions,
either in general or particularly on AMD.
What do your profiles look like for before and after?
--
Jens Axboe
next prev parent reply other threads:[~2021-09-11 22:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-11 21:47 [PATCH] block: Optimize bio_init() Bart Van Assche
2021-09-11 22:01 ` Jens Axboe
2021-09-11 22:09 ` Jens Axboe
2021-09-11 22:16 ` Jens Axboe [this message]
2021-09-12 3:19 ` Bart Van Assche
2021-09-12 13:03 ` Jens Axboe
2021-09-12 22:01 ` Bart Van Assche
2021-09-12 22:13 ` Jens Axboe
2021-09-13 3:52 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe7f7cc7-2403-7ec6-7c1c-abb6ac6a68fa@kernel.dk \
--to=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).