Hi Jens, thank you for your reply. Given that you have read the thread after the first reply, I think some of the questions of your first email are no longer relevant. I still answered them at the bottom for completeness, but I will answer the more interesting ones first. > I turn off iostats and merging for the device. Doing this helped quite a bit. The 512b reads went from 715K to 800K. The 4096b reads went from 570K to 630K. > Note that you'll need to configure NVMe to properly use polling. I use 32 poll queues, number isn't really that important for single core testing, as long as there's enough to have a poll queue local to CPU being tested on. My SSD was configured to use 128/0/0 default/read/poll queues. I added "nvme.poll_queues=32" to GRUB and rebooted, which changed it to 96/0/32. I now get 1.0M IOPS (512b blocks) and 790K IOPS (4096b blocks) using a single core. Thank you very much, this probably was the main bottleneck. Launching the benchmark two times with 512b blocks, I get 1.4M IOPS total. Starting single-threaded t/io_uring with two SSDs still achieves "only" 1.0M IOPS, independently of the block size. In your benchmarks from 2019 [0] when Linux 5.4 (which I am using) was current, you achieved 1.6M IOPS (4096b blocks) using a single core. I get the full 1.6M IOPS for saturating both SSDs (4096b blocks) only when running t/io_uring with two threads. This makes me think that there is still another configuration option that I am missing. Most time is spent in the kernel. # time taskset -c 48 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 /dev/nvme0n1 /dev/nvme1n1 i 8, argc 10 Added file /dev/nvme0n1 (submitter 0) Added file /dev/nvme1n1 (submitter 0) sq_ring ptr = 0x0x7f78fb740000 sqes ptr = 0x0x7f78fb73e000 cq_ring ptr = 0x0x7f78fb73c000 polled=1, fixedbufs=1, register_files=1, buffered=0 QD=128, sq_ring=128, cq_ring=256 submitter=2336 IOPS=1014252, IOS/call=31/31, inflight=102 (38, 64) IOPS=1017984, IOS/call=31/31, inflight=123 (64, 59) IOPS=1018220, IOS/call=31/31, inflight=102 (38, 64) [...] real 0m7.898s user 0m0.144s sys 0m7.661s I attached a perf output to the email. It was generated using the same parameters as above (getting 1.0M IOPS). Thank you very much for your help. I am looking forward to hearing from you again to be able fully reproduce your measurements soon. Hans-Peter === Answers to (I think) no longer relevant questions === > The options I run t/io_uring with have been posted multiple times, it's this one This is the same configuration that I also ran (I did not explicitly specify the parameters that are the same as the default). > Make sure your nvme device is using 'none' as the IO scheduler. The scheduler is set to 'none'. > Is this a gen2 optane? It is not an optane disk but I also do not expect to get insanely high numbers like in your recent benchmarks. Just more close to the old benchmarks but using two SSDs. === References === [0]: https://twitter.com/axboe/status/1174777844313911296