fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans-Peter Lehmann <hans-peter.lehmann@kit.edu>
To: Erwan Velu <e.velu@criteo.com>, fio@vger.kernel.org
Subject: Re: Question: t/io_uring performance
Date: Thu, 26 Aug 2021 17:57:16 +0200	[thread overview]
Message-ID: <5b58a227-c376-1f3e-7a10-1aa5483bdc0d@kit.edu> (raw)
In-Reply-To: <867506cc-642e-1047-08c6-aae60e7294c5@criteo.com>


Thank you very much for your reply.

> You didn't mention the size of your P4510

Sorry, the P4510 SSDs each have 2 TB.

> Did you checked how your NVMEs are connected via their PCI lanes? It's obvious here that you need multiple PCI-GEN3 lanes to reach 1.6M IOPS (I'd say two).

If I understand the lspci output (listed below) correctly, the SSDs are connected directly to the same PCIe root complex, each of them getting their maximum of x4 lanes. Given that I can saturate the SSDs when using 2 t/io_uring instances, I think the hardware-side connection should not be the limitation - or am I missing something?

> Then considering the EPYC processor, what's your current Numa configuration? 

The processor was configured to use one single Numa node (NPS=1). I just tried to switch to NPS=4 and ran the benchmark on a core belonging to the SSDs' Numa node (using numactl). It brought the IOPS from 580k to 590k. That's still nowhere near the values that Jens got.

> If you want to run a single core benchmark, you should also ensure how the IRQs are pinned over the Cores and NUMA domains (even if it's a single socket CPU).

Is IRQ pinning the "big thing" that will double the IOPS? To me, it sounds like there must be something else that is wrong. I will definitely try it, though.


= Details =

# lspci -tv
-+-[0000:c0]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
  |           +- [...]
  +-[0000:80]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
  |           +- [...]
  +-[0000:40]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
  |           +- [...]
  \-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
              +- [...]
              +-03.1-[01]----00.0  Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
              +-03.2-[02]----00.0  Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

# lspci -vv
01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
         Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510)
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 65
         NUMA node: 0
         [...]
         Capabilities: [60] Express (v2) Endpoint, MSI 00
                 LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s <64ns
                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                 [...]
02:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
         Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE 2.5" U.2 (P4510)
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 67
         NUMA node: 0
         [...]
         Capabilities: [60] Express (v2) Endpoint, MSI 00
                 LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s <64ns
                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                 [...]


  reply	other threads:[~2021-08-26 15:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 15:57 Question: t/io_uring performance Hans-Peter Lehmann
2021-08-26  7:27 ` Erwan Velu
2021-08-26 15:57   ` Hans-Peter Lehmann [this message]
2021-08-27  7:20     ` Erwan Velu
2021-09-01 10:36       ` Hans-Peter Lehmann
2021-09-01 13:17         ` Erwan Velu
2021-09-01 14:02           ` Hans-Peter Lehmann
2021-09-01 14:05             ` Erwan Velu
2021-09-01 14:17               ` Erwan Velu
2021-09-06 14:26                 ` Hans-Peter Lehmann
2021-09-06 14:41                   ` Erwan Velu
2021-09-08 11:53                   ` Sitsofe Wheeler
2021-09-08 12:22                     ` Jens Axboe
2021-09-08 12:41                       ` Jens Axboe
2021-09-08 16:12                         ` Hans-Peter Lehmann
2021-09-08 16:20                           ` Jens Axboe
2021-09-08 21:24                             ` Hans-Peter Lehmann
2021-09-08 21:34                               ` Jens Axboe
2021-09-10 11:25                                 ` Hans-Peter Lehmann
2021-09-10 11:45                                   ` Erwan Velu
2021-09-08 12:33                 ` Jens Axboe
2021-09-08 17:11                   ` Erwan Velu
2021-09-08 22:37                     ` Erwan Velu
2021-09-16 21:18                       ` Erwan Velu
2021-09-21  7:05                         ` Erwan Velu
2021-09-22 14:45                           ` Hans-Peter Lehmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b58a227-c376-1f3e-7a10-1aa5483bdc0d@kit.edu \
    --to=hans-peter.lehmann@kit.edu \
    --cc=e.velu@criteo.com \
    --cc=fio@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).