All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Hans-Peter Lehmann <hans-peter.lehmann@kit.edu>,
	"fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: Question: t/io_uring performance
Date: Wed, 8 Sep 2021 10:20:27 -0600	[thread overview]
Message-ID: <4ce4addd-a7c7-f35f-ef3b-b0bf9966e224@kernel.dk> (raw)
In-Reply-To: <1cf066bb-aa71-1403-c80c-454ea87a9502@kit.edu>

On 9/8/21 10:12 AM, Hans-Peter Lehmann wrote:
> Hi Jens,
> 
> thank you for your reply. Given that you have read the thread after the first reply, I think some of the questions of your first email are no longer relevant. I still answered them at the bottom for completeness, but I will answer the more interesting ones first.
> 
>> I turn off iostats and merging for the device.
> 
> 
> 
> Doing this helped quite a bit. The 512b reads went from 715K to 800K. The 4096b reads went from 570K to 630K.
> 
>> Note that you'll need to configure NVMe
>   to properly use polling. I use 32 poll queues, number isn't really
>   that important for single core testing, as long as there's enough to
>   have a poll queue local to CPU being tested on.
> 
> My SSD was configured to use 128/0/0 default/read/poll queues. I added
> "nvme.poll_queues=32" to GRUB and rebooted, which changed it to
> 96/0/32. I now get 1.0M IOPS (512b blocks) and 790K IOPS (4096b
> blocks) using a single core. Thank you very much, this probably was
> the main bottleneck. Launching the benchmark two times with 512b
> blocks, I get 1.4M IOPS total.

Sounds like IRQs are expensive on your box, it does vary quite a bit
between systems.

What's the advertised peak random read performance of the devices you
are using?

> Starting single-threaded t/io_uring with two SSDs still achieves "only" 1.0M IOPS, independently of the block size. In your benchmarks from 2019 [0] when Linux 5.4 (which I am using) was current, you achieved 1.6M IOPS (4096b blocks) using a single core. I get the full 1.6M IOPS for saturating both SSDs (4096b blocks) only when running t/io_uring with two threads. This makes me think that there is still another configuration option that I am missing. Most time is spent in the kernel.
> 
> # time taskset -c 48 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 /dev/nvme0n1 /dev/nvme1n1
> i 8, argc 10
> Added file /dev/nvme0n1 (submitter 0)
> Added file /dev/nvme1n1 (submitter 0)
> sq_ring ptr = 0x0x7f78fb740000
> sqes ptr    = 0x0x7f78fb73e000
> cq_ring ptr = 0x0x7f78fb73c000
> polled=1, fixedbufs=1, register_files=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
> submitter=2336
> IOPS=1014252, IOS/call=31/31, inflight=102 (38, 64)
> IOPS=1017984, IOS/call=31/31, inflight=123 (64, 59)
> IOPS=1018220, IOS/call=31/31, inflight=102 (38, 64)
> [...]
> real    0m7.898s
> user    0m0.144s
> sys     0m7.661s
> 
> I attached a perf output to the email. It was generated using the same parameters as above (getting 1.0M IOPS).

Looking at the perf trace, it looks pretty apparent:

     7.54%  io_uring  [kernel.kallsyms]  [k] read_tsc                           

which means you're spending ~8% of the time of the worload just reading
time stamps. As is often the case once you get near core limits,
realistically that'll cut more than 8% of your perf. Did you turn off
iostats? If so, then there's a few things in the kernel config that can
cause this. One is BLK_CGROUP_IOCOST, is that enabled? Might be more if
you're still on that old kernel.

Would be handy to have -g enabled for your perf record and report, since
that would show us exactly who's calling the expensive bits. The next
one is memset(), which also looks suspect. But may be related to:

https://git.kernel.dk/cgit/linux-block/commit/block/bio.c?id=da521626ac620d8719d674a48b8ec3620eefd42a

-- 
Jens Axboe


  reply	other threads:[~2021-09-08 16:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 15:57 Question: t/io_uring performance Hans-Peter Lehmann
2021-08-26  7:27 ` Erwan Velu
2021-08-26 15:57   ` Hans-Peter Lehmann
2021-08-27  7:20     ` Erwan Velu
2021-09-01 10:36       ` Hans-Peter Lehmann
2021-09-01 13:17         ` Erwan Velu
2021-09-01 14:02           ` Hans-Peter Lehmann
2021-09-01 14:05             ` Erwan Velu
2021-09-01 14:17               ` Erwan Velu
2021-09-06 14:26                 ` Hans-Peter Lehmann
2021-09-06 14:41                   ` Erwan Velu
2021-09-08 11:53                   ` Sitsofe Wheeler
2021-09-08 12:22                     ` Jens Axboe
2021-09-08 12:41                       ` Jens Axboe
2021-09-08 16:12                         ` Hans-Peter Lehmann
2021-09-08 16:20                           ` Jens Axboe [this message]
2021-09-08 21:24                             ` Hans-Peter Lehmann
2021-09-08 21:34                               ` Jens Axboe
2021-09-10 11:25                                 ` Hans-Peter Lehmann
2021-09-10 11:45                                   ` Erwan Velu
2021-09-08 12:33                 ` Jens Axboe
2021-09-08 17:11                   ` Erwan Velu
2021-09-08 22:37                     ` Erwan Velu
2021-09-16 21:18                       ` Erwan Velu
2021-09-21  7:05                         ` Erwan Velu
2021-09-22 14:45                           ` Hans-Peter Lehmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ce4addd-a7c7-f35f-ef3b-b0bf9966e224@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=hans-peter.lehmann@kit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.