Ah, thanks Jim, wasn’t running w/sudo.  FYI here’s my output FWIW, jives with what both Nate and Jim have said:

 

peluse@pels-64:~/spdk/examples/ioat/kperf$ sudo ./ioat_kperf -n 8

Total 8 Channels, Queue_Depth 128, Transfer Size 4096 Bytes, Total Transfer Size 4 GB

Running I/O . . . . . . . . .

Channel 0 Bandwidth 584 MiB/s

Channel 1 Bandwidth 584 MiB/s

Channel 2 Bandwidth 584 MiB/s

Channel 3 Bandwidth 584 MiB/s

Channel 4 Bandwidth 584 MiB/s

Channel 5 Bandwidth 584 MiB/s

Channel 6 Bandwidth 584 MiB/s

Channel 7 Bandwidth 584 MiB/s

Total Channel Bandwidth: 4904 MiB/s

Average Bandwidth Per Channel: 584 MiB/s

peluse@pels-64:~/spdk/examples/ioat/kperf$ sudo ./ioat_kperf -n 4

Total 4 Channels, Queue_Depth 128, Transfer Size 4096 Bytes, Total Transfer Size 4 GB

Running I/O . . . . .

Channel 0 Bandwidth 1258 MiB/s

Channel 1 Bandwidth 1258 MiB/s

Channel 2 Bandwidth 1260 MiB/s

Channel 3 Bandwidth 1255 MiB/s

Total Channel Bandwidth: 5266 MiB/s

Average Bandwidth Per Channel: 1255 MiB/s

peluse@pels-64:~/spdk/examples/ioat/kperf$ lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                72

On-line CPU(s) list:   0-71

Thread(s) per core:    2

Core(s) per socket:    18

Socket(s):             2

NUMA node(s):          2

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 63

Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Stepping:              2

CPU MHz:               1201.121

CPU max MHz:           3600.0000

CPU min MHz:           1200.0000

BogoMIPS:              4591.78

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              46080K

NUMA node0 CPU(s):     0-17,36-53

NUMA node1 CPU(s):     18-35,54-71

Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

 

From: Harris, James R
Sent: Monday, December 4, 2017 9:06 AM
To: Storage Performance Development Kit <spdk@lists.01.org>; Marushak, Nathan <nathan.marushak@intel.com>; Luse, Paul E <paul.e.luse@intel.com>
Subject: Re: [SPDK] ioat performance questions

 

Hi,

 

5GB/s is the expected aggregate throughput for all of the ioat channels on a single Intel Xeon CPU socket.  All of the channels on one CPU socket share the same hardware “pipe”, so using additional channels from that socket will not increase the overall throughput.

 

Note that the ioat channels on the recently released Intel Xeon Scalable processors share this same shared bandwidth architecture, but with an aggregate throughput closer to 10GB/s per CPU socket.

 

In the Intel specs, ioat is referred to as Quickdata, so searching on “intel quickdata specification” finds some relevant public links.  Section 3.4 in https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-1600-2600-vol-2-datasheet.pdf has a lot of details on the register definitions.

 

Thanks,

 

-Jim

 

P.S. Hey Paul – you need to run ioat_kperf as root, in addition to making sure that ioat channels are assigned to the kernel ioat driver.

 

 

 

From: SPDK <spdk-bounces@lists.01.org> on behalf of "huangqingxin@ruijie.com.cn" <huangqingxin@ruijie.com.cn>
Reply-To: Storage Performance Development Kit <spdk@lists.01.org>
Date: Monday, December 4, 2017 at 8:59 AM
To: Nathan Marushak <nathan.marushak@intel.com>, "spdk@lists.01.org" <spdk@lists.01.org>, Paul E Luse <paul.e.luse@intel.com>
Subject: Re: [SPDK] ioat performance questions

 

Hi Nathan

 

Thanks, How can I get specification about DMA?  And why the channels grow up but the average of per channel goes down?

 

From: Marushak, Nathan

Date: 2017-12-04 23:53

To: Storage Performance Development Kit; Luse, Paul E

Subject: Re: [SPDK] ioat performance questions

Depending on the platform you are using, 5 GB/s is likely the expected throughput.

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of huangqingxin@ruijie.com.cn
Sent: Monday, December 04, 2017 8:00 AM
To: Luse, Paul E <paul.e.luse@intel.com>; spdk@lists.01.org
Subject: Re: [SPDK] ioat performance questions

 

hi, Paul

 

Thank you!

If you have run the ./scripts/setup.sh , the DMA channels will be unloaded, cause No DMA channels or Devices found.

Have you ever tried to reset the DMA channels from vfio? You can run `./scripts/setup.sh reset` .

 

 

From: Luse, Paul E

Date: 2017-12-04 22:19

Subject: Re: [SPDK] ioat performance questions

I’m sure someone else can help. I at least tried to repro your results as another data point but even after following the direction son

https://github.com/spdk/spdk/tree/master/examples/ioat/kperf I get:

 

peluse@pels-64:~/spdk/examples/ioat/kperf$ ./ioat_kperf -n 8

Cannot set dma channels

 

-Paul

 

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of huangqingxin@ruijie.com.cn
Sent: Monday, December 4, 2017 6:38 AM
To: spdk@lists.01.org
Subject: [SPDK] ioat performance questions

 

hi, 

 

When I run the ioat_perf provided by spdk , I get this result.

 

[root@localhost kperf]# ./ioat_kperf -n 8
Total 8 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB
Running I/O . . . . . . . .
Channel 0 Bandwidth 661 MiB/s
Channel 1 Bandwidth 660 MiB/s
Channel 2 Bandwidth 661 MiB/s
Channel 3 Bandwidth 661 MiB/s
Channel 4 Bandwidth 661 MiB/s
Channel 5 Bandwidth 661 MiB/s
Channel 6 Bandwidth 661 MiB/s
Channel 7 Bandwidth 661 MiB/s
Total Channel Bandwidth: 5544 MiB/s
Average Bandwidth Per Channel: 660 MiB/s
[root@localhost kperf]# ./ioat_kperf -n 4
Total 4 Channels, Queue_Depth 256, Transfer Size 4096 Bytes, Total Transfer Size 4 GB
Running I/O . . . . .
Channel 0 Bandwidth 1319 MiB/s
Channel 1 Bandwidth 1322 MiB/s
Channel 2 Bandwidth 1319 MiB/s
Channel 3 Bandwidth 1318 MiB/s
Total Channel Bandwidth: 5530 MiB/s
Average Bandwidth Per Channel: 1318 MiB/s
[root@localhost kperf]#

 

[root@localhost kperf]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1200.000
CPU max MHz:           2400.0000
CPU min MHz:           1200.0000
BogoMIPS:              4799.90
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

 

I found the `Total Channel Bandwidth` can not increase with more channels. What's the limitation? Does the performance of ioat dma on E5 V3 can only access around 5GB/s ?

 

Any helps will be appreciated!