Hi Frank, While your question is very straightforward, the answer isn't, because "it depends". IOAT is just one mechanism for data movement. A core, a NIC, or other add in cards can also move data from one place to another. So, depending on how an application is architected and the design choices one makes, the overall throughput can be limited by: memory, NIC, ioat, etc. I realize this doesn't provide the answer your looking for...well perhaps it does in a general sense: no you can't make that conclusion with the information in this email. Hope this helps. Thanks, Nate On Dec 7, 2017, at 1:40 AM, Huang Frank > wrote: Hi,Jim Can I make a conclusion that the upper limit of throughput of a server will be limited by the IOAT , if others conditions(networks,etc.) are ideal? ________________________________ kinzent(a)hotmail.com From: Harris, James R Date: 2017-12-06 23:20 To: huangqingxin(a)ruijie.com.cn; spdk(a)lists.01.org; Marushak, Nathan; Luse, Paul E Subject: Re: [SPDK] ioat performance questions This 5GB/s limitation only affects the IOAT DMA engines. It does not affect DMA engines that may exist in other PCIe devices. DMA engines in other PCIe devices will be subject to different limitations including the width and speed of its PCIe link. -Jim From: "huangqingxin(a)ruijie.com.cn" > Date: Wednesday, December 6, 2017 at 7:54 AM To: James Harris >, "spdk(a)lists.01.org" >, Nathan Marushak >, Paul E Luse > Subject: Re: Re: [SPDK] ioat performance questions Hi, Jim Thank you for the reply. It helps me to make a good understand of IOAT. ButI have another question about DMA. Will the `first-party` type of DMA be limited by the same hardware "pipe"? For example , there are a lot of PCIe devices as follow, when these device use the DMA to communicate, the throughput is determined by the "pipe" or the BUS itself? [root(a)localhost ntb]# lspci | grep Intel 00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02) 00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02) 00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02) 00:02.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02) 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) 00:03.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) 00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02) From: Harris, James R Date: 2017-12-05 00:05 To: Storage Performance Development Kit; Marushak, Nathan; Luse, Paul E Subject: Re: [SPDK] ioat performance questions Hi, 5GB/s is the expected aggregate throughput for all of the ioat channels on a single Intel Xeon CPU socket. All of the channels on one CPU socket share the same hardware “pipe”, so using additional channels from that socket will not increase the overall throughput. Note that the ioat channels on the recently released Intel Xeon Scalable processors share this same shared bandwidth architecture, but with an aggregate throughput closer to 10GB/s per CPU socket. In the Intel specs, ioat is referred to as Quickdata, so searching on “intel quickdata specification” finds some relevant public links. Section 3.4 in https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-1600-2600-vol-2-datasheet.pdf has a lot of details on the register definitions. Thanks, -Jim P.S. Hey Paul � you need to run ioat_kperf as root, in addition to making sure that ioat channels are assigned to the kernel ioat driver. From: SPDK > on behalf of "huangqingxin(a)ruijie.com.cn" > Reply-To: Storage Performance Development Kit > Date: Monday, December 4, 2017 at 8:59 AM To: Nathan Marushak >, "spdk(a)lists.01.org" >, Paul E Luse > Subject: Re: [SPDK] ioat performance questions Hi Nathan Thanks, How can I get specification about DMA? And why the channels grow up but the average of per channel goes down? From: Marushak, Nathan Date: 2017-12-04 23:53 To: Storage Performance Development Kit; Luse, Paul E Subject: Re: [SPDK] ioat performance questions Depending on the platform you are using, 5 GB/s is likely the expected throughput. From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of huangqingxin(a)ruijie.com.cn Sent: Monday, December 04, 2017 8:00 AM To: Luse, Paul E >; spdk(a)lists.01.org Subject: Re: [SPDK] ioat performance questions hi, Paul Thank you! If you have run the ./scripts/setup.sh , the DMA channels will be unloaded, cause No DMA channels or Devices found. Have you ever tried to reset the DMA channels from vfio? You can run `./scripts/setup.sh reset` . From: Luse, Paul E Date: 2017-12-04 22:19 To: Storage Performance Development Kit Subject: Re: [SPDK] ioat performance questions I’m sure someone else can help. I at least tried to repro your results as another data point but even after following the direction son https://github.com/spdk/spdk/tree/master/examples/ioat/kperf I get: peluse(a)pels-64:~/spdk/examples/ioat/kperf$ ./ioat_kperf -n 8 Cannot set dma channels -Paul From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of huangqingxin(a)ruijie.com.cn Sent: Monday, December 4, 2017 6:38 AM To: spdk(a)lists.01.org Subject: [SPDK] ioat performance questions hi, When I run the ioat_perf provided by spdk , I get this result. [root(a)localhost kperf]# ./ioat_kperf -n 8 Total 8 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB Running I/O . . . . . . . . Channel 0 Bandwidth 661 MiB/s Channel 1 Bandwidth 660 MiB/s Channel 2 Bandwidth 661 MiB/s Channel 3 Bandwidth 661 MiB/s Channel 4 Bandwidth 661 MiB/s Channel 5 Bandwidth 661 MiB/s Channel 6 Bandwidth 661 MiB/s Channel 7 Bandwidth 661 MiB/s Total Channel Bandwidth: 5544 MiB/s Average Bandwidth Per Channel: 660 MiB/s [root(a)localhost kperf]# ./ioat_kperf -n 4 Total 4 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB Running I/O . . . . . Channel 0 Bandwidth 1319 MiB/s Channel 1 Bandwidth 1322 MiB/s Channel 2 Bandwidth 1319 MiB/s Channel 3 Bandwidth 1318 MiB/s Total Channel Bandwidth: 5530 MiB/s Average Bandwidth Per Channel: 1318 MiB/s [root(a)localhost kperf]# [root(a)localhost kperf]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Stepping: 2 CPU MHz: 1200.000 CPU max MHz: 2400.0000 CPU min MHz: 1200.0000 BogoMIPS: 4799.90 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-5,12-17 NUMA node1 CPU(s): 6-11,18-23 I found the `Total Channel Bandwidth` can not increase with more channels. What's the limitation? Does the performance of ioat dma on E5 V3 can only access around 5GB/s ? Any helps will be appreciated!