Hi, Jim

Thank you for the reply. It helps me to make a good understand of IOAT. ButI have another question about DMA. Will the `first-party` type of DMA be limited by the same hardware "pipe"?
For example , there are a lot of PCIe devices as follow, when these device use the DMA to communicate, the throughput is determined by the "pipe" or the BUS itself?

[root@localhost ntb]# lspci | grep Intel
00:00.0 Host bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 (rev 02)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 (rev 02)
00:02.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02)
00:02.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
00:03.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02)
00:05.0 System peripheral: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management (rev 02)

From: Harris, James R
Date: 2017-12-05 00:05
To: Storage Performance Development Kit; Marushak, Nathan; Luse, Paul E
Subject: Re: [SPDK] ioat performance questions

Hi,

 

5GB/s is the expected aggregate throughput for all of the ioat channels on a single Intel Xeon CPU socket.  All of the channels on one CPU socket share the same hardware “pipe”, so using additional channels from that socket will not increase the overall throughput.

 

Note that the ioat channels on the recently released Intel Xeon Scalable processors share this same shared bandwidth architecture, but with an aggregate throughput closer to 10GB/s per CPU socket.

 

In the Intel specs, ioat is referred to as Quickdata, so searching on “intel quickdata specification” finds some relevant public links.  Section 3.4 in https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-1600-2600-vol-2-datasheet.pdf has a lot of details on the register definitions.

 

Thanks,

 

-Jim

 

P.S. Hey Paul – you need to run ioat_kperf as root, in addition to making sure that ioat channels are assigned to the kernel ioat driver.

 

 

 

From: SPDK <spdk-bounces@lists.01.org> on behalf of "huangqingxin@ruijie.com.cn" <huangqingxin@ruijie.com.cn>
Reply-To: Storage Performance Development Kit <spdk@lists.01.org>
Date: Monday, December 4, 2017 at 8:59 AM
To: Nathan Marushak <nathan.marushak@intel.com>, "spdk@lists.01.org" <spdk@lists.01.org>, Paul E Luse <paul.e.luse@intel.com>
Subject: Re: [SPDK] ioat performance questions

 

Hi Nathan

 

Thanks, How can I get specification about DMA?  And why the channels grow up but the average of per channel goes down?

 

Date: 2017-12-04 23:53

Subject: Re: [SPDK] ioat performance questions

Depending on the platform you are using, 5 GB/s is likely the expected throughput.

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of huangqingxin@ruijie.com.cn
Sent: Monday, December 04, 2017 8:00 AM
To: Luse, Paul E <paul.e.luse@intel.com>; spdk@lists.01.org
Subject: Re: [SPDK] ioat performance questions

 

hi, Paul

 

Thank you!

If you have run the ./scripts/setup.sh , the DMA channels will be unloaded, cause No DMA channels or Devices found.

Have you ever tried to reset the DMA channels from vfio? You can run `./scripts/setup.sh reset` .

 

 

From: Luse, Paul E

Date: 2017-12-04 22:19

Subject: Re: [SPDK] ioat performance questions

I’m sure someone else can help. I at least tried to repro your results as another data point but even after following the direction son

https://github.com/spdk/spdk/tree/master/examples/ioat/kperf I get:

 

peluse@pels-64:~/spdk/examples/ioat/kperf$ ./ioat_kperf -n 8

Cannot set dma channels

 

-Paul

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of huangqingxin@ruijie.com.cn
Sent: Monday, December 4, 2017 6:38 AM
To: spdk@lists.01.org
Subject: [SPDK] ioat performance questions

 

hi, 

 

When I run the ioat_perf provided by spdk , I get this result.

 

[root@localhost kperf]# ./ioat_kperf -n 8
Total 8 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB
Running I/O . . . . . . . .
Channel 0 Bandwidth 661 MiB/s
Channel 1 Bandwidth 660 MiB/s
Channel 2 Bandwidth 661 MiB/s
Channel 3 Bandwidth 661 MiB/s
Channel 4 Bandwidth 661 MiB/s
Channel 5 Bandwidth 661 MiB/s
Channel 6 Bandwidth 661 MiB/s
Channel 7 Bandwidth 661 MiB/s
Total Channel Bandwidth: 5544 MiB/s
Average Bandwidth Per Channel: 660 MiB/s
[root@localhost kperf]# ./ioat_kperf -n 4
Total 4 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB
Running I/O . . . . .
Channel 0 Bandwidth 1319 MiB/s
Channel 1 Bandwidth 1322 MiB/s
Channel 2 Bandwidth 1319 MiB/s
Channel 3 Bandwidth 1318 MiB/s
Total Channel Bandwidth: 5530 MiB/s
Average Bandwidth Per Channel: 1318 MiB/s
[root@localhost kperf]#

 

[root@localhost kperf]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1200.000
CPU max MHz:           2400.0000
CPU min MHz:           1200.0000
BogoMIPS:              4799.90
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

 

I found the `Total Channel Bandwidth` can not increase with more channels. What's the limitation? Does the performance of ioat dma on E5 V3 can only access around 5GB/s ?

 

Any helps will be appreciated!