All of lore.kernel.org
 help / color / mirror / Atom feed
* Connect-IB not performing as well as ConnectX-3 with iSER
@ 2016-06-06 22:36 Robert LeBlanc
       [not found] ` <CAANLjFoL5zow4f4RXP5t8LM7wsWN1OQ-hD2mtPUBTLkJ7UZ5kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-06 22:36 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I'm trying to understand why our Connect-IB card is not performing as
well as our ConnectX-3 card. There are 3 ports between the two cards
and 12 paths to the iSER target which is a RAM disk.

8: ib0.9770@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 65520
qdisc pfifo_fast state UP group default qlen 256
   link/infiniband
80:00:02:0a:fe:80:00:00:00:00:00:00:0c:c4:7a:ff:ff:4f:e5:d1 brd
00:ff:ff:ff:ff:12:40:1b:97:70:00:00:00:00:00:00:ff:ff:ff:ff
   inet 10.218.128.17/16 brd 10.218.255.255 scope global ib0.9770
   inet 10.218.202.17/16 brd 10.218.255.255 scope global secondary ib0.9770:0
   inet 10.218.203.17/16 brd 10.218.255.255 scope global secondary ib0.9770:1
   inet 10.218.204.17/16 brd 10.218.255.255 scope global secondary ib0.9770:2
   inet6 fe80::ec4:7aff:ff4f:e5d1/64 scope link
9: ib1.9770@ib1: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 65520
qdisc pfifo_fast state UP group default qlen 256
   link/infiniband
80:00:00:2d:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:00:df:90 brd
00:ff:ff:ff:ff:12:40:1b:97:70:00:00:00:00:00:00:ff:ff:ff:ff
   inet 10.219.128.17/16 brd 10.219.255.255 scope global ib1.9770
   inet 10.219.202.17/16 brd 10.219.255.255 scope global secondary ib1.9770:0
   inet 10.219.203.17/16 brd 10.219.255.255 scope global secondary ib1.9770:1
   inet 10.219.204.17/16 brd 10.219.255.255 scope global secondary ib1.9770:2
   inet6 fe80::e61d:2d03:0:df90/64 scope link
10: ib2.9770@ib2: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 65520
qdisc pfifo_fast state UP group default qlen 256
   link/infiniband
80:00:00:2f:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:00:df:98 brd
00:ff:ff:ff:ff:12:40:1b:97:70:00:00:00:00:00:00:ff:ff:ff:ff
   inet 10.220.128.17/16 brd 10.220.255.255 scope global ib2.9770
   inet 10.220.202.17/16 brd 10.220.255.255 scope global secondary ib2.9770:0
   inet 10.220.203.17/16 brd 10.220.255.255 scope global secondary ib2.9770:1
   inet 10.220.204.17/16 brd 10.220.255.255 scope global secondary ib2.9770:2
   inet6 fe80::e61d:2d03:0:df98/64 scope link

The ConnectX-3 card is ib0 and Connect-IB is ib{1,2}.

# ibv_devinfo
hca_id: mlx5_0
       transport:                      InfiniBand (0)
       fw_ver:                         10.16.1006
       node_guid:                      e41d:2d03:0000:df90
       sys_image_guid:                 e41d:2d03:0000:df90
       vendor_id:                      0x02c9
       vendor_part_id:                 4113
       hw_ver:                         0x0
       board_id:                       MT_1210110019
       phys_port_cnt:                  2
               port:   1
                       state:                  PORT_ACTIVE (4)
                       max_mtu:                4096 (5)
                       active_mtu:             4096 (5)
                       sm_lid:                 1
                       port_lid:               29
                       port_lmc:               0x00
                       link_layer:             InfiniBand

               port:   2
                       state:                  PORT_ACTIVE (4)
                       max_mtu:                4096 (5)
                       active_mtu:             4096 (5)
                       sm_lid:                 1
                       port_lid:               28
                       port_lmc:               0x00
                       link_layer:             InfiniBand

hca_id: mlx4_0
       transport:                      InfiniBand (0)
       fw_ver:                         2.35.5100
       node_guid:                      0cc4:7aff:ff4f:e5d0
       sys_image_guid:                 0cc4:7aff:ff4f:e5d3
       vendor_id:                      0x02c9
       vendor_part_id:                 4099
       hw_ver:                         0x0
       board_id:                       SM_2221000001000
       phys_port_cnt:                  1
               port:   1
                       state:                  PORT_ACTIVE (4)
                       max_mtu:                4096 (5)
                       active_mtu:             4096 (5)
                       sm_lid:                 1
                       port_lid:               34
                       port_lmc:               0x00
                       link_layer:             InfiniBand

When I run fio against each path individually, I get:

disk;target IP;bandwidth,IOPs,Execution time
sdn;10.218.128.17;5053682;1263420;16599
sde;10.218.202.17;5032158;1258039;16670
sdh;10.218.203.17;4993516;1248379;16799
sdk;10.218.204.17;5081848;1270462;16507
sdc;10.219.128.17;3750942;937735;22364
sdf;10.219.202.17;3746921;936730;22388
sdi;10.219.203.17;3873929;968482;21654
sdl;10.219.204.17;3841465;960366;21837
sdd;10.220.128.17;3760358;940089;22308
sdg;10.220.202.17;3866252;966563;21697
sdj;10.220.203.17;3757495;939373;22325
sdm;10.220.204.17;4064051;1016012;20641

However, running ib_send_bw, I get:

# ib_send_bw -d mlx4_0 -i 1 10.218.128.17 -F --report_gbits
---------------------------------------------------------------------------------------
                   Send BW Test
Dual-port       : OFF          Device         : mlx4_0
Number of qps   : 1            Transport type : IB
Connection type : RC           Using SRQ      : OFF
TX depth        : 128
CQ Moderation   : 100
Mtu             : 2048[B]
Link type       : IB
Max inline data : 0[B]
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x3f QPN 0x02b5 PSN 0x87274e
remote address: LID 0x22 QPN 0x0213 PSN 0xaf9232
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3219.835000 != 3063.531000
Test integrity may be harmed !
Warning: measured timestamp frequency 2599.95 differs from nominal 3219.84 MHz
65536      1000             50.57              50.57              0.096461
---------------------------------------------------------------------------------------
# ib_send_bw -d mlx5_0 -i 1 10.219.128.17 -F --report_gbits
---------------------------------------------------------------------------------------
                   Send BW Test
Dual-port       : OFF          Device         : mlx5_0
Number of qps   : 1            Transport type : IB
Connection type : RC           Using SRQ      : OFF
TX depth        : 128
CQ Moderation   : 100
Mtu             : 4096[B]
Link type       : IB
Max inline data : 0[B]
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x12 QPN 0x003e PSN 0x75f1a0
remote address: LID 0x1d QPN 0x003e PSN 0x7f7f71
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3399.906000 != 2747.773000
Test integrity may be harmed !
Warning: measured timestamp frequency 2599.98 differs from nominal 3399.91 MHz
65536      1000             52.12              52.12              0.099414
---------------------------------------------------------------------------------------
# ib_send_bw -d mlx5_0 -i 2 10.220.128.17 -F --report_gbits
---------------------------------------------------------------------------------------
                   Send BW Test
Dual-port       : OFF          Device         : mlx5_0
Number of qps   : 1            Transport type : IB
Connection type : RC           Using SRQ      : OFF
TX depth        : 128
CQ Moderation   : 100
Mtu             : 4096[B]
Link type       : IB
Max inline data : 0[B]
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0f QPN 0x0041 PSN 0xb7203d
remote address: LID 0x1c QPN 0x0041 PSN 0xf8b80a
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3327.796000 != 1771.046000
Test integrity may be harmed !
Warning: measured timestamp frequency 2599.97 differs from nominal 3327.8 MHz
65536      1000             52.14              52.14              0.099441
---------------------------------------------------------------------------------------

Here I see that the ConnectX-3 cards with iSER is matching the
performance of the ib_send_bw. However, the Connect-IB performs better
than the mlx4 with ib_send_bw, but performs much worse with iSER.

This is running the 4.4.4 kernel. Is there some ideas of what I can do
to get the iSER performance out of the Connect-IB cards?

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found] ` <CAANLjFoL5zow4f4RXP5t8LM7wsWN1OQ-hD2mtPUBTLkJ7UZ5kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-07 12:02   ` Max Gurtovoy
       [not found]     ` <5756B7D2.5040009-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Max Gurtovoy @ 2016-06-07 12:02 UTC (permalink / raw)
  To: Robert LeBlanc, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
> I'm trying to understand why our Connect-IB card is not performing as
> well as our ConnectX-3 card. There are 3 ports between the two cards
> and 12 paths to the iSER target which is a RAM disk.

<snip>

>
> When I run fio against each path individually, I get:

What is the scenario (bs, numjobs, iodepth) for each run ?
Which target do you use ? backing store ?

>
> disk;target IP;bandwidth,IOPs,Execution time
> sdn;10.218.128.17;5053682;1263420;16599
> sde;10.218.202.17;5032158;1258039;16670
> sdh;10.218.203.17;4993516;1248379;16799
> sdk;10.218.204.17;5081848;1270462;16507
> sdc;10.219.128.17;3750942;937735;22364
> sdf;10.219.202.17;3746921;936730;22388
> sdi;10.219.203.17;3873929;968482;21654
> sdl;10.219.204.17;3841465;960366;21837
> sdd;10.220.128.17;3760358;940089;22308
> sdg;10.220.202.17;3866252;966563;21697
> sdj;10.220.203.17;3757495;939373;22325
> sdm;10.220.204.17;4064051;1016012;20641
>
> However, running ib_send_bw, I get:
>
> # ib_send_bw -d mlx4_0 -i 1 10.218.128.17 -F --report_gbits
> ---------------------------------------------------------------------------------------
>                     Send BW Test
> Dual-port       : OFF          Device         : mlx4_0
> Number of qps   : 1            Transport type : IB
> Connection type : RC           Using SRQ      : OFF
> TX depth        : 128
> CQ Moderation   : 100
> Mtu             : 2048[B]
> Link type       : IB
> Max inline data : 0[B]
> rdma_cm QPs     : OFF
> Data ex. method : Ethernet
> ---------------------------------------------------------------------------------------
> local address: LID 0x3f QPN 0x02b5 PSN 0x87274e
> remote address: LID 0x22 QPN 0x0213 PSN 0xaf9232
> ---------------------------------------------------------------------------------------
> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
> Conflicting CPU frequency values detected: 3219.835000 != 3063.531000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 2599.95 differs from nominal 3219.84 MHz
> 65536      1000             50.57              50.57              0.096461
> ---------------------------------------------------------------------------------------
> # ib_send_bw -d mlx5_0 -i 1 10.219.128.17 -F --report_gbits
> ---------------------------------------------------------------------------------------
>                     Send BW Test
> Dual-port       : OFF          Device         : mlx5_0
> Number of qps   : 1            Transport type : IB
> Connection type : RC           Using SRQ      : OFF
> TX depth        : 128
> CQ Moderation   : 100
> Mtu             : 4096[B]
> Link type       : IB
> Max inline data : 0[B]
> rdma_cm QPs     : OFF
> Data ex. method : Ethernet
> ---------------------------------------------------------------------------------------
> local address: LID 0x12 QPN 0x003e PSN 0x75f1a0
> remote address: LID 0x1d QPN 0x003e PSN 0x7f7f71
> ---------------------------------------------------------------------------------------
> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
> Conflicting CPU frequency values detected: 3399.906000 != 2747.773000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 2599.98 differs from nominal 3399.91 MHz
> 65536      1000             52.12              52.12              0.099414
> ---------------------------------------------------------------------------------------
> # ib_send_bw -d mlx5_0 -i 2 10.220.128.17 -F --report_gbits
> ---------------------------------------------------------------------------------------
>                     Send BW Test
> Dual-port       : OFF          Device         : mlx5_0
> Number of qps   : 1            Transport type : IB
> Connection type : RC           Using SRQ      : OFF
> TX depth        : 128
> CQ Moderation   : 100
> Mtu             : 4096[B]
> Link type       : IB
> Max inline data : 0[B]
> rdma_cm QPs     : OFF
> Data ex. method : Ethernet
> ---------------------------------------------------------------------------------------
> local address: LID 0x0f QPN 0x0041 PSN 0xb7203d
> remote address: LID 0x1c QPN 0x0041 PSN 0xf8b80a
> ---------------------------------------------------------------------------------------
> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
> Conflicting CPU frequency values detected: 3327.796000 != 1771.046000
> Test integrity may be harmed !
> Warning: measured timestamp frequency 2599.97 differs from nominal 3327.8 MHz
> 65536      1000             52.14              52.14              0.099441
> ---------------------------------------------------------------------------------------
>
> Here I see that the ConnectX-3 cards with iSER is matching the
> performance of the ib_send_bw. However, the Connect-IB performs better
> than the mlx4 with ib_send_bw, but performs much worse with iSER.
>
> This is running the 4.4.4 kernel. Is there some ideas of what I can do
> to get the iSER performance out of the Connect-IB cards?

did you see this regression in different kernel ?

>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]     ` <5756B7D2.5040009-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-06-07 16:48       ` Robert LeBlanc
       [not found]         ` <CAANLjFq4CoOSbng=aPHiSsFB=1HMSwAhhLiCjt+88dzz24OT9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-07 16:48 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
fio as follows:

fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
--group_reporting --minimal |  cut -d';' -f7,8,9

All of the paths are set the same with noop and nomerges to either 1
or 2 (doesn't make a big difference).

I started looking into this when the 4.6 kernel wasn't performing as
well as we were able to get the 4.4 kernel to work. I went back to the
4.4 kernel and I could not replicate the 4+ million IOPs. So I started
breaking down the problem to smaller pieces and found this anomaly.
Since there hasn't been any suggestions up to this point, I'll check
other kernel version to see if it is specific to certain kernels. If
you need more information, please let me know.

Thanks,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
>
> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>
>> I'm trying to understand why our Connect-IB card is not performing as
>> well as our ConnectX-3 card. There are 3 ports between the two cards
>> and 12 paths to the iSER target which is a RAM disk.
>
>
> <snip>
>
>>
>> When I run fio against each path individually, I get:
>
>
> What is the scenario (bs, numjobs, iodepth) for each run ?
> Which target do you use ? backing store ?
>
>
>>
>> disk;target IP;bandwidth,IOPs,Execution time
>> sdn;10.218.128.17;5053682;1263420;16599
>> sde;10.218.202.17;5032158;1258039;16670
>> sdh;10.218.203.17;4993516;1248379;16799
>> sdk;10.218.204.17;5081848;1270462;16507
>> sdc;10.219.128.17;3750942;937735;22364
>> sdf;10.219.202.17;3746921;936730;22388
>> sdi;10.219.203.17;3873929;968482;21654
>> sdl;10.219.204.17;3841465;960366;21837
>> sdd;10.220.128.17;3760358;940089;22308
>> sdg;10.220.202.17;3866252;966563;21697
>> sdj;10.220.203.17;3757495;939373;22325
>> sdm;10.220.204.17;4064051;1016012;20641
>>
>> However, running ib_send_bw, I get:
>>
>> # ib_send_bw -d mlx4_0 -i 1 10.218.128.17 -F --report_gbits
>>
>> ---------------------------------------------------------------------------------------
>>                     Send BW Test
>> Dual-port       : OFF          Device         : mlx4_0
>> Number of qps   : 1            Transport type : IB
>> Connection type : RC           Using SRQ      : OFF
>> TX depth        : 128
>> CQ Moderation   : 100
>> Mtu             : 2048[B]
>> Link type       : IB
>> Max inline data : 0[B]
>> rdma_cm QPs     : OFF
>> Data ex. method : Ethernet
>>
>> ---------------------------------------------------------------------------------------
>> local address: LID 0x3f QPN 0x02b5 PSN 0x87274e
>> remote address: LID 0x22 QPN 0x0213 PSN 0xaf9232
>>
>> ---------------------------------------------------------------------------------------
>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>> MsgRate[Mpps]
>> Conflicting CPU frequency values detected: 3219.835000 != 3063.531000
>> Test integrity may be harmed !
>> Warning: measured timestamp frequency 2599.95 differs from nominal 3219.84
>> MHz
>> 65536      1000             50.57              50.57              0.096461
>>
>> ---------------------------------------------------------------------------------------
>> # ib_send_bw -d mlx5_0 -i 1 10.219.128.17 -F --report_gbits
>>
>> ---------------------------------------------------------------------------------------
>>                     Send BW Test
>> Dual-port       : OFF          Device         : mlx5_0
>> Number of qps   : 1            Transport type : IB
>> Connection type : RC           Using SRQ      : OFF
>> TX depth        : 128
>> CQ Moderation   : 100
>> Mtu             : 4096[B]
>> Link type       : IB
>> Max inline data : 0[B]
>> rdma_cm QPs     : OFF
>> Data ex. method : Ethernet
>>
>> ---------------------------------------------------------------------------------------
>> local address: LID 0x12 QPN 0x003e PSN 0x75f1a0
>> remote address: LID 0x1d QPN 0x003e PSN 0x7f7f71
>>
>> ---------------------------------------------------------------------------------------
>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>> MsgRate[Mpps]
>> Conflicting CPU frequency values detected: 3399.906000 != 2747.773000
>> Test integrity may be harmed !
>> Warning: measured timestamp frequency 2599.98 differs from nominal 3399.91
>> MHz
>> 65536      1000             52.12              52.12              0.099414
>>
>> ---------------------------------------------------------------------------------------
>> # ib_send_bw -d mlx5_0 -i 2 10.220.128.17 -F --report_gbits
>>
>> ---------------------------------------------------------------------------------------
>>                     Send BW Test
>> Dual-port       : OFF          Device         : mlx5_0
>> Number of qps   : 1            Transport type : IB
>> Connection type : RC           Using SRQ      : OFF
>> TX depth        : 128
>> CQ Moderation   : 100
>> Mtu             : 4096[B]
>> Link type       : IB
>> Max inline data : 0[B]
>> rdma_cm QPs     : OFF
>> Data ex. method : Ethernet
>>
>> ---------------------------------------------------------------------------------------
>> local address: LID 0x0f QPN 0x0041 PSN 0xb7203d
>> remote address: LID 0x1c QPN 0x0041 PSN 0xf8b80a
>>
>> ---------------------------------------------------------------------------------------
>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>> MsgRate[Mpps]
>> Conflicting CPU frequency values detected: 3327.796000 != 1771.046000
>> Test integrity may be harmed !
>> Warning: measured timestamp frequency 2599.97 differs from nominal 3327.8
>> MHz
>> 65536      1000             52.14              52.14              0.099441
>>
>> ---------------------------------------------------------------------------------------
>>
>> Here I see that the ConnectX-3 cards with iSER is matching the
>> performance of the ib_send_bw. However, the Connect-IB performs better
>> than the mlx4 with ib_send_bw, but performs much worse with iSER.
>>
>> This is running the 4.4.4 kernel. Is there some ideas of what I can do
>> to get the iSER performance out of the Connect-IB cards?
>
>
> did you see this regression in different kernel ?
>
>
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]         ` <CAANLjFq4CoOSbng=aPHiSsFB=1HMSwAhhLiCjt+88dzz24OT9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-07 22:37           ` Robert LeBlanc
       [not found]             ` <CAANLjFoLJNQWtHHqjHmhc0iBq14NAV_GgkbyQabjzyeN56t+Ow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-07 22:37 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On the 4.1.15 kernel:
sdc;10.218.128.17;3971878;992969;21120
sdd;10.218.202.17;3967745;991936;21142
sdg;10.218.203.17;3938128;984532;21301
sdk;10.218.204.17;3952602;988150;21223
sdn;10.219.128.17;4615719;1153929;18174
sdf;10.219.202.17;4622331;1155582;18148
sdi;10.219.203.17;4602297;1150574;18227
sdl;10.219.204.17;4565477;1141369;18374
sde;10.220.128.17;4594986;1148746;18256
sdh;10.220.202.17;4590209;1147552;18275
sdj;10.220.203.17;4599017;1149754;18240
sdm;10.220.204.17;4610898;1152724;18193

On the 4.6.0 kernel:
sdc;10.218.128.17;3239219;809804;25897
sdf;10.218.202.17;3321300;830325;25257
sdm;10.218.203.17;3339015;834753;25123
sdk;10.218.204.17;3637573;909393;23061
sde;10.219.128.17;3325777;831444;25223
sdl;10.219.202.17;3305464;826366;25378
sdg;10.219.203.17;3304032;826008;25389
sdn;10.219.204.17;3330001;832500;25191
sdd;10.220.128.17;4624370;1156092;18140
sdi;10.220.202.17;4619277;1154819;18160
sdj;10.220.203.17;4610138;1152534;18196
sdh;10.220.204.17;4586445;1146611;18290

It seems that there is a lot of changes between the kernels. I had
these kernels already on the box and I can bisect them if you think it
would help. It is really odd that port 2 on the Connect-IB card did
better than port 1 on the 4.6.0 kernel.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
> fio as follows:
>
> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
> --group_reporting --minimal |  cut -d';' -f7,8,9
>
> All of the paths are set the same with noop and nomerges to either 1
> or 2 (doesn't make a big difference).
>
> I started looking into this when the 4.6 kernel wasn't performing as
> well as we were able to get the 4.4 kernel to work. I went back to the
> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
> breaking down the problem to smaller pieces and found this anomaly.
> Since there hasn't been any suggestions up to this point, I'll check
> other kernel version to see if it is specific to certain kernels. If
> you need more information, please let me know.
>
> Thanks,
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>
>>
>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>
>>> I'm trying to understand why our Connect-IB card is not performing as
>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>> and 12 paths to the iSER target which is a RAM disk.
>>
>>
>> <snip>
>>
>>>
>>> When I run fio against each path individually, I get:
>>
>>
>> What is the scenario (bs, numjobs, iodepth) for each run ?
>> Which target do you use ? backing store ?
>>
>>
>>>
>>> disk;target IP;bandwidth,IOPs,Execution time
>>> sdn;10.218.128.17;5053682;1263420;16599
>>> sde;10.218.202.17;5032158;1258039;16670
>>> sdh;10.218.203.17;4993516;1248379;16799
>>> sdk;10.218.204.17;5081848;1270462;16507
>>> sdc;10.219.128.17;3750942;937735;22364
>>> sdf;10.219.202.17;3746921;936730;22388
>>> sdi;10.219.203.17;3873929;968482;21654
>>> sdl;10.219.204.17;3841465;960366;21837
>>> sdd;10.220.128.17;3760358;940089;22308
>>> sdg;10.220.202.17;3866252;966563;21697
>>> sdj;10.220.203.17;3757495;939373;22325
>>> sdm;10.220.204.17;4064051;1016012;20641
>>>
>>> However, running ib_send_bw, I get:
>>>
>>> # ib_send_bw -d mlx4_0 -i 1 10.218.128.17 -F --report_gbits
>>>
>>> ---------------------------------------------------------------------------------------
>>>                     Send BW Test
>>> Dual-port       : OFF          Device         : mlx4_0
>>> Number of qps   : 1            Transport type : IB
>>> Connection type : RC           Using SRQ      : OFF
>>> TX depth        : 128
>>> CQ Moderation   : 100
>>> Mtu             : 2048[B]
>>> Link type       : IB
>>> Max inline data : 0[B]
>>> rdma_cm QPs     : OFF
>>> Data ex. method : Ethernet
>>>
>>> ---------------------------------------------------------------------------------------
>>> local address: LID 0x3f QPN 0x02b5 PSN 0x87274e
>>> remote address: LID 0x22 QPN 0x0213 PSN 0xaf9232
>>>
>>> ---------------------------------------------------------------------------------------
>>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>>> MsgRate[Mpps]
>>> Conflicting CPU frequency values detected: 3219.835000 != 3063.531000
>>> Test integrity may be harmed !
>>> Warning: measured timestamp frequency 2599.95 differs from nominal 3219.84
>>> MHz
>>> 65536      1000             50.57              50.57              0.096461
>>>
>>> ---------------------------------------------------------------------------------------
>>> # ib_send_bw -d mlx5_0 -i 1 10.219.128.17 -F --report_gbits
>>>
>>> ---------------------------------------------------------------------------------------
>>>                     Send BW Test
>>> Dual-port       : OFF          Device         : mlx5_0
>>> Number of qps   : 1            Transport type : IB
>>> Connection type : RC           Using SRQ      : OFF
>>> TX depth        : 128
>>> CQ Moderation   : 100
>>> Mtu             : 4096[B]
>>> Link type       : IB
>>> Max inline data : 0[B]
>>> rdma_cm QPs     : OFF
>>> Data ex. method : Ethernet
>>>
>>> ---------------------------------------------------------------------------------------
>>> local address: LID 0x12 QPN 0x003e PSN 0x75f1a0
>>> remote address: LID 0x1d QPN 0x003e PSN 0x7f7f71
>>>
>>> ---------------------------------------------------------------------------------------
>>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>>> MsgRate[Mpps]
>>> Conflicting CPU frequency values detected: 3399.906000 != 2747.773000
>>> Test integrity may be harmed !
>>> Warning: measured timestamp frequency 2599.98 differs from nominal 3399.91
>>> MHz
>>> 65536      1000             52.12              52.12              0.099414
>>>
>>> ---------------------------------------------------------------------------------------
>>> # ib_send_bw -d mlx5_0 -i 2 10.220.128.17 -F --report_gbits
>>>
>>> ---------------------------------------------------------------------------------------
>>>                     Send BW Test
>>> Dual-port       : OFF          Device         : mlx5_0
>>> Number of qps   : 1            Transport type : IB
>>> Connection type : RC           Using SRQ      : OFF
>>> TX depth        : 128
>>> CQ Moderation   : 100
>>> Mtu             : 4096[B]
>>> Link type       : IB
>>> Max inline data : 0[B]
>>> rdma_cm QPs     : OFF
>>> Data ex. method : Ethernet
>>>
>>> ---------------------------------------------------------------------------------------
>>> local address: LID 0x0f QPN 0x0041 PSN 0xb7203d
>>> remote address: LID 0x1c QPN 0x0041 PSN 0xf8b80a
>>>
>>> ---------------------------------------------------------------------------------------
>>> #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]
>>> MsgRate[Mpps]
>>> Conflicting CPU frequency values detected: 3327.796000 != 1771.046000
>>> Test integrity may be harmed !
>>> Warning: measured timestamp frequency 2599.97 differs from nominal 3327.8
>>> MHz
>>> 65536      1000             52.14              52.14              0.099441
>>>
>>> ---------------------------------------------------------------------------------------
>>>
>>> Here I see that the ConnectX-3 cards with iSER is matching the
>>> performance of the ib_send_bw. However, the Connect-IB performs better
>>> than the mlx4 with ib_send_bw, but performs much worse with iSER.
>>>
>>> This is running the 4.4.4 kernel. Is there some ideas of what I can do
>>> to get the iSER performance out of the Connect-IB cards?
>>
>>
>> did you see this regression in different kernel ?
>>
>>
>>>
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]             ` <CAANLjFoLJNQWtHHqjHmhc0iBq14NAV_GgkbyQabjzyeN56t+Ow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-08 13:52               ` Max Gurtovoy
       [not found]                 ` <57582336.10407-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Max Gurtovoy @ 2016-06-08 13:52 UTC (permalink / raw)
  To: Robert LeBlanc; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
> On the 4.1.15 kernel:
> sdc;10.218.128.17;3971878;992969;21120
> sdd;10.218.202.17;3967745;991936;21142
> sdg;10.218.203.17;3938128;984532;21301
> sdk;10.218.204.17;3952602;988150;21223
> sdn;10.219.128.17;4615719;1153929;18174
> sdf;10.219.202.17;4622331;1155582;18148
> sdi;10.219.203.17;4602297;1150574;18227
> sdl;10.219.204.17;4565477;1141369;18374
> sde;10.220.128.17;4594986;1148746;18256
> sdh;10.220.202.17;4590209;1147552;18275
> sdj;10.220.203.17;4599017;1149754;18240
> sdm;10.220.204.17;4610898;1152724;18193
>
> On the 4.6.0 kernel:
> sdc;10.218.128.17;3239219;809804;25897
> sdf;10.218.202.17;3321300;830325;25257
> sdm;10.218.203.17;3339015;834753;25123
> sdk;10.218.204.17;3637573;909393;23061
> sde;10.219.128.17;3325777;831444;25223
> sdl;10.219.202.17;3305464;826366;25378
> sdg;10.219.203.17;3304032;826008;25389
> sdn;10.219.204.17;3330001;832500;25191
> sdd;10.220.128.17;4624370;1156092;18140
> sdi;10.220.202.17;4619277;1154819;18160
> sdj;10.220.203.17;4610138;1152534;18196
> sdh;10.220.204.17;4586445;1146611;18290
>
> It seems that there is a lot of changes between the kernels. I had
> these kernels already on the box and I can bisect them if you think it
> would help. It is really odd that port 2 on the Connect-IB card did
> better than port 1 on the 4.6.0 kernel.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

so in these kernels you get better performance with the C-IB than CX3 ?
we need to find the bottleneck.
Can you increase the iodepth and/or block size to see if we can reach 
the wire speed.
another try is to load ib_iser with always_register=N.

what is the cpu utilzation in both initiator/target ?
did you spread the irq affinity ?

>
>
> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
>> fio as follows:
>>
>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>
>> All of the paths are set the same with noop and nomerges to either 1
>> or 2 (doesn't make a big difference).
>>
>> I started looking into this when the 4.6 kernel wasn't performing as
>> well as we were able to get the 4.4 kernel to work. I went back to the
>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
>> breaking down the problem to smaller pieces and found this anomaly.
>> Since there hasn't been any suggestions up to this point, I'll check
>> other kernel version to see if it is specific to certain kernels. If
>> you need more information, please let me know.
>>
>> Thanks,
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>>
>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>
>>>> I'm trying to understand why our Connect-IB card is not performing as
>>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>>> and 12 paths to the iSER target which is a RAM disk.
>>>
>>>
>>> <snip>
>>>
>>>>
>>>> When I run fio against each path individually, I get:
>>>
>>>
>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>> Which target do you use ? backing store ?
>>>
>>>
>>>>
>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>> sde;10.218.202.17;5032158;1258039;16670
>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>> sdc;10.219.128.17;3750942;937735;22364
>>>> sdf;10.219.202.17;3746921;936730;22388
>>>> sdi;10.219.203.17;3873929;968482;21654
>>>> sdl;10.219.204.17;3841465;960366;21837
>>>> sdd;10.220.128.17;3760358;940089;22308
>>>> sdg;10.220.202.17;3866252;966563;21697
>>>> sdj;10.220.203.17;3757495;939373;22325
>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                 ` <57582336.10407-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-06-08 15:33                   ` Robert LeBlanc
  2016-06-10 21:36                     ` Robert LeBlanc
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-08 15:33 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
MIOPs. I confirmed this morning that the card order was swapped on the
4.6.0 kernel and it was not different ports of the C-IB performing
differently, but different cards.

Given the limitations of the PCIe 8x port for the CX3, I think 1.25
MIOPs is about the best we can do there. In summary, the performance
of the C-IB card drops after 4.1.15 and gets progressively worse as
the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
degrades a bit on the 4.6.0 kernel.

Increasing the IO depth by adding jobs does not improve performance,
it actually decreases performance. Based on an average of 4 runs at
each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
difference in performance is less than 1%.

Similarly, increasing block request size does not really change the
figures to reach line speed.

Here is the output of the 4.6.0 kernel with 4M bs:
sdc;10.218.128.17;3354638;819;25006
sdf;10.218.202.17;3376920;824;24841
sdm;10.218.203.17;3367431;822;24911
sdk;10.218.204.17;3378960;824;24826
sde;10.219.128.17;3366350;821;24919
sdl;10.219.202.17;3379641;825;24821
sdg;10.219.203.17;3391254;827;24736
sdn;10.219.204.17;3401706;830;24660
sdd;10.220.128.17;4597505;1122;18246
sdi;10.220.202.17;4594231;1121;18259
sdj;10.220.203.17;4667598;1139;17972
sdh;10.220.204.17;4628197;1129;18125

The CPU on the target is a kworker thread at 96%, but no single
processor over 15%. The initiator has low fio CPU utilization (<10%)
for each job and no single CPU over 22% utilized.

I have tried manually spreading the IRQ affinity over the processors
of the respective NUMA nodes and there was no noticeable change in
performance when doing so.

Loading ib_iser on the initiator shows maybe a slight increase in performance:

sdc;10.218.128.17;3396885;849221;24695
sdf;10.218.202.17;3429240;857310;24462
sdi;10.218.203.17;3454234;863558;24285
sdm;10.218.204.17;3391666;847916;24733
sde;10.219.128.17;3403914;850978;24644
sdh;10.219.202.17;3491034;872758;24029
sdk;10.219.203.17;3390569;847642;24741
sdl;10.219.204.17;3498898;874724;23975
sdd;10.220.128.17;4664743;1166185;17983
sdg;10.220.202.17;4624880;1156220;18138
sdj;10.220.203.17;4616227;1154056;18172
sdn;10.220.204.17;4619786;1154946;18158

I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
can do that performance and we were limited on the CX3 by the PCIe bus
which isn't an issue with the 16x C-IB card for a single port).
Although the loss of performance in the CX3 card is concerning, I'm
mostly focused on the C-IB card at the moment. I will probably start
bisecting 4.1.15 to 4.4.4 to see if I can identify when the
performance of the C-IB card degrades.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
>
> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>
>> On the 4.1.15 kernel:
>> sdc;10.218.128.17;3971878;992969;21120
>> sdd;10.218.202.17;3967745;991936;21142
>> sdg;10.218.203.17;3938128;984532;21301
>> sdk;10.218.204.17;3952602;988150;21223
>> sdn;10.219.128.17;4615719;1153929;18174
>> sdf;10.219.202.17;4622331;1155582;18148
>> sdi;10.219.203.17;4602297;1150574;18227
>> sdl;10.219.204.17;4565477;1141369;18374
>> sde;10.220.128.17;4594986;1148746;18256
>> sdh;10.220.202.17;4590209;1147552;18275
>> sdj;10.220.203.17;4599017;1149754;18240
>> sdm;10.220.204.17;4610898;1152724;18193
>>
>> On the 4.6.0 kernel:
>> sdc;10.218.128.17;3239219;809804;25897
>> sdf;10.218.202.17;3321300;830325;25257
>> sdm;10.218.203.17;3339015;834753;25123
>> sdk;10.218.204.17;3637573;909393;23061
>> sde;10.219.128.17;3325777;831444;25223
>> sdl;10.219.202.17;3305464;826366;25378
>> sdg;10.219.203.17;3304032;826008;25389
>> sdn;10.219.204.17;3330001;832500;25191
>> sdd;10.220.128.17;4624370;1156092;18140
>> sdi;10.220.202.17;4619277;1154819;18160
>> sdj;10.220.203.17;4610138;1152534;18196
>> sdh;10.220.204.17;4586445;1146611;18290
>>
>> It seems that there is a lot of changes between the kernels. I had
>> these kernels already on the box and I can bisect them if you think it
>> would help. It is really odd that port 2 on the Connect-IB card did
>> better than port 1 on the 4.6.0 kernel.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> so in these kernels you get better performance with the C-IB than CX3 ?
> we need to find the bottleneck.
> Can you increase the iodepth and/or block size to see if we can reach the
> wire speed.
> another try is to load ib_iser with always_register=N.
>
> what is the cpu utilzation in both initiator/target ?
> did you spread the irq affinity ?
>
>
>>
>>
>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>> wrote:
>>>
>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
>>> fio as follows:
>>>
>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>
>>> All of the paths are set the same with noop and nomerges to either 1
>>> or 2 (doesn't make a big difference).
>>>
>>> I started looking into this when the 4.6 kernel wasn't performing as
>>> well as we were able to get the 4.4 kernel to work. I went back to the
>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
>>> breaking down the problem to smaller pieces and found this anomaly.
>>> Since there hasn't been any suggestions up to this point, I'll check
>>> other kernel version to see if it is specific to certain kernels. If
>>> you need more information, please let me know.
>>>
>>> Thanks,
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>>
>>>>
>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>
>>>>>
>>>>> I'm trying to understand why our Connect-IB card is not performing as
>>>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>
>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> When I run fio against each path individually, I get:
>>>>
>>>>
>>>>
>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>> Which target do you use ? backing store ?
>>>>
>>>>
>>>>>
>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
  2016-06-08 15:33                   ` Robert LeBlanc
@ 2016-06-10 21:36                     ` Robert LeBlanc
       [not found]                       ` <CAANLjFrv-0VArTEkgqbrhzFjn1fg_egpCJuQZnAurVrHjbL_qA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-10 21:36 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I bisected the kernel and it looks like the performance of the
Connect-IB card goes down and the performance of the ConnectX-3 card
goes up with this commit (but I'm not sure why this would cause this):

ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
commit ab46db0a3325a064bb24e826b12995d157565efb
Author: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date:   Thu Dec 3 10:06:43 2015 +0100

   perf stat: Use perf_evlist__enable in handle_initial_delay

   No need to mimic the behaviour of perf_evlist__enable, we can use it
   directly.

   Signed-off-by: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
   Tested-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
   Cc: Adrian Hunter <adrian.hunter-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
   Cc: David Ahern <dsahern-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
   Cc: Namhyung Kim <namhyung-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
   Cc: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
   Link: http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
   Signed-off-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

:040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
b63d9b366f078eabf86f4da3d1cc53ae7434a949 M      tools

4.4.0_rc2_3e27c920
sdc;10.218.128.17;5291495;1322873;15853
sde;10.218.202.17;4966024;1241506;16892
sdh;10.218.203.17;4980471;1245117;16843
sdk;10.218.204.17;4966612;1241653;16890
sdd;10.219.128.17;5060084;1265021;16578
sdf;10.219.202.17;5065278;1266319;16561
sdi;10.219.203.17;5047600;1261900;16619
sdl;10.219.204.17;5036992;1259248;16654
sdn;10.220.128.17;3775081;943770;22221
sdg;10.220.202.17;3758336;939584;22320
sdj;10.220.203.17;3792832;948208;22117
sdm;10.220.204.17;3771516;942879;22242

4.4.0_rc2_ab46db0a
sdc;10.218.128.17;3792146;948036;22121
sdf;10.218.202.17;3738405;934601;22439
sdj;10.218.203.17;3764239;941059;22285
sdl;10.218.204.17;3785302;946325;22161
sdd;10.219.128.17;3762382;940595;22296
sdg;10.219.202.17;3765760;941440;22276
sdi;10.219.203.17;3873751;968437;21655
sdm;10.219.204.17;3769483;942370;22254
sde;10.220.128.17;5022517;1255629;16702
sdh;10.220.202.17;5018911;1254727;16714
sdk;10.220.203.17;5037295;1259323;16653
sdn;10.220.204.17;5033064;1258266;16667

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
> about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
> drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
> kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
> MIOPs. I confirmed this morning that the card order was swapped on the
> 4.6.0 kernel and it was not different ports of the C-IB performing
> differently, but different cards.
>
> Given the limitations of the PCIe 8x port for the CX3, I think 1.25
> MIOPs is about the best we can do there. In summary, the performance
> of the C-IB card drops after 4.1.15 and gets progressively worse as
> the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
> degrades a bit on the 4.6.0 kernel.
>
> Increasing the IO depth by adding jobs does not improve performance,
> it actually decreases performance. Based on an average of 4 runs at
> each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
> difference in performance is less than 1%.
>
> Similarly, increasing block request size does not really change the
> figures to reach line speed.
>
> Here is the output of the 4.6.0 kernel with 4M bs:
> sdc;10.218.128.17;3354638;819;25006
> sdf;10.218.202.17;3376920;824;24841
> sdm;10.218.203.17;3367431;822;24911
> sdk;10.218.204.17;3378960;824;24826
> sde;10.219.128.17;3366350;821;24919
> sdl;10.219.202.17;3379641;825;24821
> sdg;10.219.203.17;3391254;827;24736
> sdn;10.219.204.17;3401706;830;24660
> sdd;10.220.128.17;4597505;1122;18246
> sdi;10.220.202.17;4594231;1121;18259
> sdj;10.220.203.17;4667598;1139;17972
> sdh;10.220.204.17;4628197;1129;18125
>
> The CPU on the target is a kworker thread at 96%, but no single
> processor over 15%. The initiator has low fio CPU utilization (<10%)
> for each job and no single CPU over 22% utilized.
>
> I have tried manually spreading the IRQ affinity over the processors
> of the respective NUMA nodes and there was no noticeable change in
> performance when doing so.
>
> Loading ib_iser on the initiator shows maybe a slight increase in performance:
>
> sdc;10.218.128.17;3396885;849221;24695
> sdf;10.218.202.17;3429240;857310;24462
> sdi;10.218.203.17;3454234;863558;24285
> sdm;10.218.204.17;3391666;847916;24733
> sde;10.219.128.17;3403914;850978;24644
> sdh;10.219.202.17;3491034;872758;24029
> sdk;10.219.203.17;3390569;847642;24741
> sdl;10.219.204.17;3498898;874724;23975
> sdd;10.220.128.17;4664743;1166185;17983
> sdg;10.220.202.17;4624880;1156220;18138
> sdj;10.220.203.17;4616227;1154056;18172
> sdn;10.220.204.17;4619786;1154946;18158
>
> I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
> can do that performance and we were limited on the CX3 by the PCIe bus
> which isn't an issue with the 16x C-IB card for a single port).
> Although the loss of performance in the CX3 card is concerning, I'm
> mostly focused on the C-IB card at the moment. I will probably start
> bisecting 4.1.15 to 4.4.4 to see if I can identify when the
> performance of the C-IB card degrades.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>
>>
>> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>>
>>> On the 4.1.15 kernel:
>>> sdc;10.218.128.17;3971878;992969;21120
>>> sdd;10.218.202.17;3967745;991936;21142
>>> sdg;10.218.203.17;3938128;984532;21301
>>> sdk;10.218.204.17;3952602;988150;21223
>>> sdn;10.219.128.17;4615719;1153929;18174
>>> sdf;10.219.202.17;4622331;1155582;18148
>>> sdi;10.219.203.17;4602297;1150574;18227
>>> sdl;10.219.204.17;4565477;1141369;18374
>>> sde;10.220.128.17;4594986;1148746;18256
>>> sdh;10.220.202.17;4590209;1147552;18275
>>> sdj;10.220.203.17;4599017;1149754;18240
>>> sdm;10.220.204.17;4610898;1152724;18193
>>>
>>> On the 4.6.0 kernel:
>>> sdc;10.218.128.17;3239219;809804;25897
>>> sdf;10.218.202.17;3321300;830325;25257
>>> sdm;10.218.203.17;3339015;834753;25123
>>> sdk;10.218.204.17;3637573;909393;23061
>>> sde;10.219.128.17;3325777;831444;25223
>>> sdl;10.219.202.17;3305464;826366;25378
>>> sdg;10.219.203.17;3304032;826008;25389
>>> sdn;10.219.204.17;3330001;832500;25191
>>> sdd;10.220.128.17;4624370;1156092;18140
>>> sdi;10.220.202.17;4619277;1154819;18160
>>> sdj;10.220.203.17;4610138;1152534;18196
>>> sdh;10.220.204.17;4586445;1146611;18290
>>>
>>> It seems that there is a lot of changes between the kernels. I had
>>> these kernels already on the box and I can bisect them if you think it
>>> would help. It is really odd that port 2 on the Connect-IB card did
>>> better than port 1 on the 4.6.0 kernel.
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> so in these kernels you get better performance with the C-IB than CX3 ?
>> we need to find the bottleneck.
>> Can you increase the iodepth and/or block size to see if we can reach the
>> wire speed.
>> another try is to load ib_iser with always_register=N.
>>
>> what is the cpu utilzation in both initiator/target ?
>> did you spread the irq affinity ?
>>
>>
>>>
>>>
>>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>>> wrote:
>>>>
>>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
>>>> fio as follows:
>>>>
>>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>>
>>>> All of the paths are set the same with noop and nomerges to either 1
>>>> or 2 (doesn't make a big difference).
>>>>
>>>> I started looking into this when the 4.6 kernel wasn't performing as
>>>> well as we were able to get the 4.4 kernel to work. I went back to the
>>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
>>>> breaking down the problem to smaller pieces and found this anomaly.
>>>> Since there hasn't been any suggestions up to this point, I'll check
>>>> other kernel version to see if it is specific to certain kernels. If
>>>> you need more information, please let me know.
>>>>
>>>> Thanks,
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>>
>>>>>>
>>>>>> I'm trying to understand why our Connect-IB card is not performing as
>>>>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>>
>>>>>
>>>>>
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> When I run fio against each path individually, I get:
>>>>>
>>>>>
>>>>>
>>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>>> Which target do you use ? backing store ?
>>>>>
>>>>>
>>>>>>
>>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                       ` <CAANLjFrv-0VArTEkgqbrhzFjn1fg_egpCJuQZnAurVrHjbL_qA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-20 15:23                         ` Robert LeBlanc
       [not found]                           ` <CAANLjFqoV-5HK0c+LdEbuxd81Vm=g=WE3cQgp47dH-yfYjZjGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-20 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA
  Cc: Max Gurtovoy

Adding linux-scsi

This last week I tried to figure out where a 10-15% decrease in
performance showed up between 4.5 and 4.6 using iSER and ConnectX-3
and Connect-IB cards (10.{218,219}.*.17 are Connect-IB and 10.220.*.17
are ConnectX-3). To review, straight RDMA transfers between cards
showed line rate was being achieved, just iSER was not able to achieve
those same rates for some cards on different kernels.

4.5 vanilla default config
sdc;10.218.128.17;3800048;950012;22075
sdi;10.218.202.17;3757158;939289;22327
sdg;10.218.203.17;3774062;943515;22227
sdn;10.218.204.17;3816299;954074;21981
sdd;10.219.128.17;3821863;955465;21949
sdf;10.219.202.17;3784106;946026;22168
sdj;10.219.203.17;3827094;956773;21919
sdm;10.219.204.17;3788208;947052;22144
sde;10.220.128.17;5054596;1263649;16596
sdh;10.220.202.17;5013811;1253452;16731
sdl;10.220.203.17;5052160;1263040;16604
sdk;10.220.204.17;4990248;1247562;16810

4.6 vanilla default config
sde;10.218.128.17;3431063;857765;24449
sdf;10.218.202.17;3360685;840171;24961
sdi;10.218.203.17;3355174;838793;25002
sdm;10.218.204.17;3360955;840238;24959
sdd;10.219.128.17;3337288;834322;25136
sdh;10.219.202.17;3327492;831873;25210
sdj;10.219.203.17;3380867;845216;24812
sdk;10.219.204.17;3418340;854585;24540
sdc;10.220.128.17;4668377;1167094;17969
sdg;10.220.202.17;4716675;1179168;17785
sdl;10.220.203.17;4675663;1168915;17941
sdn;10.220.204.17;4631519;1157879;18112

I narrowed the performance degradation to this series
7861728..5e47f19, but while trying to bisect it, the changes were
erratic between each commit that I could not figure out exactly which
introduced the issue. If someone could give me some pointers on what
to do, I can keep trying to dig through this.

4.5.0_rc5_7861728d_00001
sdc;10.218.128.17;3747591;936897;22384
sdf;10.218.202.17;3750607;937651;22366
sdh;10.218.203.17;3750439;937609;22367
sdn;10.218.204.17;3771008;942752;22245
sde;10.219.128.17;3867678;966919;21689
sdg;10.219.202.17;3781889;945472;22181
sdk;10.219.203.17;3791804;947951;22123
sdl;10.219.204.17;3795406;948851;22102
sdd;10.220.128.17;5039110;1259777;16647
sdi;10.220.202.17;4992921;1248230;16801
sdj;10.220.203.17;5015610;1253902;16725
sdm;10.220.204.17;5087087;1271771;16490

4.5.0_rc5_f81bf458_00018
sdb;10.218.128.17;5023720;1255930;16698
sde;10.218.202.17;5016809;1254202;16721
sdj;10.218.203.17;5021915;1255478;16704
sdk;10.218.204.17;5021314;1255328;16706
sdc;10.219.128.17;4984318;1246079;16830
sdf;10.219.202.17;4986096;1246524;16824
sdh;10.219.203.17;5043958;1260989;16631
sdm;10.219.204.17;5032460;1258115;16669
sdd;10.220.128.17;3736740;934185;22449
sdg;10.220.202.17;3728767;932191;22497
sdi;10.220.203.17;3752117;938029;22357
sdl;10.220.204.17;3763901;940975;22287

4.5.0_rc5_07b63196_00027
sdb;10.218.128.17;3606142;901535;23262
sdg;10.218.202.17;3570988;892747;23491
sdf;10.218.203.17;3576011;894002;23458
sdk;10.218.204.17;3558113;889528;23576
sdc;10.219.128.17;3577384;894346;23449
sde;10.219.202.17;3575401;893850;23462
sdj;10.219.203.17;3567798;891949;23512
sdl;10.219.204.17;3584262;896065;23404
sdd;10.220.128.17;4430680;1107670;18933
sdh;10.220.202.17;4488286;1122071;18690
sdi;10.220.203.17;4487326;1121831;18694
sdm;10.220.204.17;4441236;1110309;18888

4.5.0_rc5_5e47f198_00036
sdb;10.218.128.17;3519597;879899;23834
sdi;10.218.202.17;3512229;878057;23884
sdh;10.218.203.17;3518563;879640;23841
sdk;10.218.204.17;3582119;895529;23418
sdd;10.219.128.17;3550883;887720;23624
sdj;10.219.202.17;3558415;889603;23574
sde;10.219.203.17;3552086;888021;23616
sdl;10.219.204.17;3579521;894880;23435
sdc;10.220.128.17;4532912;1133228;18506
sdf;10.220.202.17;4558035;1139508;18404
sdg;10.220.203.17;4601035;1150258;18232
sdm;10.220.204.17;4548150;1137037;18444

While bisecting the kernel, I also stumbled across one that worked
really well for both adapters which I haven't seen in the release
kernels.

4.5.0_rc3_1aaa57f5_00399
sdc;10.218.128.17;4627942;1156985;18126
sdf;10.218.202.17;4590963;1147740;18272
sdk;10.218.203.17;4564980;1141245;18376
sdn;10.218.204.17;4571946;1142986;18348
sdd;10.219.128.17;4591717;1147929;18269
sdi;10.219.202.17;4505644;1126411;18618
sdg;10.219.203.17;4562001;1140500;18388
sdl;10.219.204.17;4583187;1145796;18303
sde;10.220.128.17;5511568;1377892;15220
sdh;10.220.202.17;5515555;1378888;15209
sdj;10.220.203.17;5609983;1402495;14953
sdm;10.220.204.17;5509035;1377258;15227

Here the ConnectX-3 card is performing perfectly while the Connect-IB
card still has some room for improvement.

I'd like to get to the bottom of why I'm not seeing the same
performance out of the newer kernels, but I just don't understand the
code. I've tried to do what I can in narrowing down where major
changes happened in the kernel to cause these changes in hopes that it
would help someone on the list. If there is anything I can do to help
out, please let me know.

Thank you,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Jun 10, 2016 at 3:36 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> I bisected the kernel and it looks like the performance of the
> Connect-IB card goes down and the performance of the ConnectX-3 card
> goes up with this commit (but I'm not sure why this would cause this):
>
> ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
> commit ab46db0a3325a064bb24e826b12995d157565efb
> Author: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Date:   Thu Dec 3 10:06:43 2015 +0100
>
>    perf stat: Use perf_evlist__enable in handle_initial_delay
>
>    No need to mimic the behaviour of perf_evlist__enable, we can use it
>    directly.
>
>    Signed-off-by: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>    Tested-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>    Cc: Adrian Hunter <adrian.hunter-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>    Cc: David Ahern <dsahern-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>    Cc: Namhyung Kim <namhyung-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>    Cc: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
>    Link: http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>    Signed-off-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>
> :040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
> b63d9b366f078eabf86f4da3d1cc53ae7434a949 M      tools
>
> 4.4.0_rc2_3e27c920
> sdc;10.218.128.17;5291495;1322873;15853
> sde;10.218.202.17;4966024;1241506;16892
> sdh;10.218.203.17;4980471;1245117;16843
> sdk;10.218.204.17;4966612;1241653;16890
> sdd;10.219.128.17;5060084;1265021;16578
> sdf;10.219.202.17;5065278;1266319;16561
> sdi;10.219.203.17;5047600;1261900;16619
> sdl;10.219.204.17;5036992;1259248;16654
> sdn;10.220.128.17;3775081;943770;22221
> sdg;10.220.202.17;3758336;939584;22320
> sdj;10.220.203.17;3792832;948208;22117
> sdm;10.220.204.17;3771516;942879;22242
>
> 4.4.0_rc2_ab46db0a
> sdc;10.218.128.17;3792146;948036;22121
> sdf;10.218.202.17;3738405;934601;22439
> sdj;10.218.203.17;3764239;941059;22285
> sdl;10.218.204.17;3785302;946325;22161
> sdd;10.219.128.17;3762382;940595;22296
> sdg;10.219.202.17;3765760;941440;22276
> sdi;10.219.203.17;3873751;968437;21655
> sdm;10.219.204.17;3769483;942370;22254
> sde;10.220.128.17;5022517;1255629;16702
> sdh;10.220.202.17;5018911;1254727;16714
> sdk;10.220.203.17;5037295;1259323;16653
> sdn;10.220.204.17;5033064;1258266;16667
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
>> about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
>> drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
>> kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
>> MIOPs. I confirmed this morning that the card order was swapped on the
>> 4.6.0 kernel and it was not different ports of the C-IB performing
>> differently, but different cards.
>>
>> Given the limitations of the PCIe 8x port for the CX3, I think 1.25
>> MIOPs is about the best we can do there. In summary, the performance
>> of the C-IB card drops after 4.1.15 and gets progressively worse as
>> the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
>> degrades a bit on the 4.6.0 kernel.
>>
>> Increasing the IO depth by adding jobs does not improve performance,
>> it actually decreases performance. Based on an average of 4 runs at
>> each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
>> difference in performance is less than 1%.
>>
>> Similarly, increasing block request size does not really change the
>> figures to reach line speed.
>>
>> Here is the output of the 4.6.0 kernel with 4M bs:
>> sdc;10.218.128.17;3354638;819;25006
>> sdf;10.218.202.17;3376920;824;24841
>> sdm;10.218.203.17;3367431;822;24911
>> sdk;10.218.204.17;3378960;824;24826
>> sde;10.219.128.17;3366350;821;24919
>> sdl;10.219.202.17;3379641;825;24821
>> sdg;10.219.203.17;3391254;827;24736
>> sdn;10.219.204.17;3401706;830;24660
>> sdd;10.220.128.17;4597505;1122;18246
>> sdi;10.220.202.17;4594231;1121;18259
>> sdj;10.220.203.17;4667598;1139;17972
>> sdh;10.220.204.17;4628197;1129;18125
>>
>> The CPU on the target is a kworker thread at 96%, but no single
>> processor over 15%. The initiator has low fio CPU utilization (<10%)
>> for each job and no single CPU over 22% utilized.
>>
>> I have tried manually spreading the IRQ affinity over the processors
>> of the respective NUMA nodes and there was no noticeable change in
>> performance when doing so.
>>
>> Loading ib_iser on the initiator shows maybe a slight increase in performance:
>>
>> sdc;10.218.128.17;3396885;849221;24695
>> sdf;10.218.202.17;3429240;857310;24462
>> sdi;10.218.203.17;3454234;863558;24285
>> sdm;10.218.204.17;3391666;847916;24733
>> sde;10.219.128.17;3403914;850978;24644
>> sdh;10.219.202.17;3491034;872758;24029
>> sdk;10.219.203.17;3390569;847642;24741
>> sdl;10.219.204.17;3498898;874724;23975
>> sdd;10.220.128.17;4664743;1166185;17983
>> sdg;10.220.202.17;4624880;1156220;18138
>> sdj;10.220.203.17;4616227;1154056;18172
>> sdn;10.220.204.17;4619786;1154946;18158
>>
>> I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
>> can do that performance and we were limited on the CX3 by the PCIe bus
>> which isn't an issue with the 16x C-IB card for a single port).
>> Although the loss of performance in the CX3 card is concerning, I'm
>> mostly focused on the C-IB card at the moment. I will probably start
>> bisecting 4.1.15 to 4.4.4 to see if I can identify when the
>> performance of the C-IB card degrades.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>>
>>> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>>>
>>>> On the 4.1.15 kernel:
>>>> sdc;10.218.128.17;3971878;992969;21120
>>>> sdd;10.218.202.17;3967745;991936;21142
>>>> sdg;10.218.203.17;3938128;984532;21301
>>>> sdk;10.218.204.17;3952602;988150;21223
>>>> sdn;10.219.128.17;4615719;1153929;18174
>>>> sdf;10.219.202.17;4622331;1155582;18148
>>>> sdi;10.219.203.17;4602297;1150574;18227
>>>> sdl;10.219.204.17;4565477;1141369;18374
>>>> sde;10.220.128.17;4594986;1148746;18256
>>>> sdh;10.220.202.17;4590209;1147552;18275
>>>> sdj;10.220.203.17;4599017;1149754;18240
>>>> sdm;10.220.204.17;4610898;1152724;18193
>>>>
>>>> On the 4.6.0 kernel:
>>>> sdc;10.218.128.17;3239219;809804;25897
>>>> sdf;10.218.202.17;3321300;830325;25257
>>>> sdm;10.218.203.17;3339015;834753;25123
>>>> sdk;10.218.204.17;3637573;909393;23061
>>>> sde;10.219.128.17;3325777;831444;25223
>>>> sdl;10.219.202.17;3305464;826366;25378
>>>> sdg;10.219.203.17;3304032;826008;25389
>>>> sdn;10.219.204.17;3330001;832500;25191
>>>> sdd;10.220.128.17;4624370;1156092;18140
>>>> sdi;10.220.202.17;4619277;1154819;18160
>>>> sdj;10.220.203.17;4610138;1152534;18196
>>>> sdh;10.220.204.17;4586445;1146611;18290
>>>>
>>>> It seems that there is a lot of changes between the kernels. I had
>>>> these kernels already on the box and I can bisect them if you think it
>>>> would help. It is really odd that port 2 on the Connect-IB card did
>>>> better than port 1 on the 4.6.0 kernel.
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> so in these kernels you get better performance with the C-IB than CX3 ?
>>> we need to find the bottleneck.
>>> Can you increase the iodepth and/or block size to see if we can reach the
>>> wire speed.
>>> another try is to load ib_iser with always_register=N.
>>>
>>> what is the cpu utilzation in both initiator/target ?
>>> did you spread the irq affinity ?
>>>
>>>
>>>>
>>>>
>>>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>>>> wrote:
>>>>>
>>>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
>>>>> fio as follows:
>>>>>
>>>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>>>
>>>>> All of the paths are set the same with noop and nomerges to either 1
>>>>> or 2 (doesn't make a big difference).
>>>>>
>>>>> I started looking into this when the 4.6 kernel wasn't performing as
>>>>> well as we were able to get the 4.4 kernel to work. I went back to the
>>>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
>>>>> breaking down the problem to smaller pieces and found this anomaly.
>>>>> Since there hasn't been any suggestions up to this point, I'll check
>>>>> other kernel version to see if it is specific to certain kernels. If
>>>>> you need more information, please let me know.
>>>>>
>>>>> Thanks,
>>>>> ----------------
>>>>> Robert LeBlanc
>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>
>>>>>
>>>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>>>
>>>>>>>
>>>>>>> I'm trying to understand why our Connect-IB card is not performing as
>>>>>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>>>
>>>>>>
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>>
>>>>>>> When I run fio against each path individually, I get:
>>>>>>
>>>>>>
>>>>>>
>>>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>>>> Which target do you use ? backing store ?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                           ` <CAANLjFqoV-5HK0c+LdEbuxd81Vm=g=WE3cQgp47dH-yfYjZjGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-20 21:27                             ` Max Gurtovoy
       [not found]                               ` <3646a0c9-3f2d-66b8-c4da-c91ca1d01cee-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-21 13:08                             ` Sagi Grimberg
  1 sibling, 1 reply; 20+ messages in thread
From: Max Gurtovoy @ 2016-06-20 21:27 UTC (permalink / raw)
  To: Robert LeBlanc, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Did you see this kind of regression in SRP ? or with some other target 
(e.g TGT) ?
Trying to understand if it's a ULP issue or LLD...

On 6/20/2016 6:23 PM, Robert LeBlanc wrote:
> Adding linux-scsi
>
> This last week I tried to figure out where a 10-15% decrease in
> performance showed up between 4.5 and 4.6 using iSER and ConnectX-3
> and Connect-IB cards (10.{218,219}.*.17 are Connect-IB and 10.220.*.17
> are ConnectX-3). To review, straight RDMA transfers between cards
> showed line rate was being achieved, just iSER was not able to achieve
> those same rates for some cards on different kernels.
>
> 4.5 vanilla default config
> sdc;10.218.128.17;3800048;950012;22075
> sdi;10.218.202.17;3757158;939289;22327
> sdg;10.218.203.17;3774062;943515;22227
> sdn;10.218.204.17;3816299;954074;21981
> sdd;10.219.128.17;3821863;955465;21949
> sdf;10.219.202.17;3784106;946026;22168
> sdj;10.219.203.17;3827094;956773;21919
> sdm;10.219.204.17;3788208;947052;22144
> sde;10.220.128.17;5054596;1263649;16596
> sdh;10.220.202.17;5013811;1253452;16731
> sdl;10.220.203.17;5052160;1263040;16604
> sdk;10.220.204.17;4990248;1247562;16810
>
> 4.6 vanilla default config
> sde;10.218.128.17;3431063;857765;24449
> sdf;10.218.202.17;3360685;840171;24961
> sdi;10.218.203.17;3355174;838793;25002
> sdm;10.218.204.17;3360955;840238;24959
> sdd;10.219.128.17;3337288;834322;25136
> sdh;10.219.202.17;3327492;831873;25210
> sdj;10.219.203.17;3380867;845216;24812
> sdk;10.219.204.17;3418340;854585;24540
> sdc;10.220.128.17;4668377;1167094;17969
> sdg;10.220.202.17;4716675;1179168;17785
> sdl;10.220.203.17;4675663;1168915;17941
> sdn;10.220.204.17;4631519;1157879;18112
>
> I narrowed the performance degradation to this series
> 7861728..5e47f19, but while trying to bisect it, the changes were
> erratic between each commit that I could not figure out exactly which
> introduced the issue. If someone could give me some pointers on what
> to do, I can keep trying to dig through this.
>
> 4.5.0_rc5_7861728d_00001
> sdc;10.218.128.17;3747591;936897;22384
> sdf;10.218.202.17;3750607;937651;22366
> sdh;10.218.203.17;3750439;937609;22367
> sdn;10.218.204.17;3771008;942752;22245
> sde;10.219.128.17;3867678;966919;21689
> sdg;10.219.202.17;3781889;945472;22181
> sdk;10.219.203.17;3791804;947951;22123
> sdl;10.219.204.17;3795406;948851;22102
> sdd;10.220.128.17;5039110;1259777;16647
> sdi;10.220.202.17;4992921;1248230;16801
> sdj;10.220.203.17;5015610;1253902;16725
> sdm;10.220.204.17;5087087;1271771;16490
>
> 4.5.0_rc5_f81bf458_00018
> sdb;10.218.128.17;5023720;1255930;16698
> sde;10.218.202.17;5016809;1254202;16721
> sdj;10.218.203.17;5021915;1255478;16704
> sdk;10.218.204.17;5021314;1255328;16706
> sdc;10.219.128.17;4984318;1246079;16830
> sdf;10.219.202.17;4986096;1246524;16824
> sdh;10.219.203.17;5043958;1260989;16631
> sdm;10.219.204.17;5032460;1258115;16669
> sdd;10.220.128.17;3736740;934185;22449
> sdg;10.220.202.17;3728767;932191;22497
> sdi;10.220.203.17;3752117;938029;22357
> sdl;10.220.204.17;3763901;940975;22287
>
> 4.5.0_rc5_07b63196_00027
> sdb;10.218.128.17;3606142;901535;23262
> sdg;10.218.202.17;3570988;892747;23491
> sdf;10.218.203.17;3576011;894002;23458
> sdk;10.218.204.17;3558113;889528;23576
> sdc;10.219.128.17;3577384;894346;23449
> sde;10.219.202.17;3575401;893850;23462
> sdj;10.219.203.17;3567798;891949;23512
> sdl;10.219.204.17;3584262;896065;23404
> sdd;10.220.128.17;4430680;1107670;18933
> sdh;10.220.202.17;4488286;1122071;18690
> sdi;10.220.203.17;4487326;1121831;18694
> sdm;10.220.204.17;4441236;1110309;18888
>
> 4.5.0_rc5_5e47f198_00036
> sdb;10.218.128.17;3519597;879899;23834
> sdi;10.218.202.17;3512229;878057;23884
> sdh;10.218.203.17;3518563;879640;23841
> sdk;10.218.204.17;3582119;895529;23418
> sdd;10.219.128.17;3550883;887720;23624
> sdj;10.219.202.17;3558415;889603;23574
> sde;10.219.203.17;3552086;888021;23616
> sdl;10.219.204.17;3579521;894880;23435
> sdc;10.220.128.17;4532912;1133228;18506
> sdf;10.220.202.17;4558035;1139508;18404
> sdg;10.220.203.17;4601035;1150258;18232
> sdm;10.220.204.17;4548150;1137037;18444
>
> While bisecting the kernel, I also stumbled across one that worked
> really well for both adapters which I haven't seen in the release
> kernels.
>
> 4.5.0_rc3_1aaa57f5_00399
> sdc;10.218.128.17;4627942;1156985;18126
> sdf;10.218.202.17;4590963;1147740;18272
> sdk;10.218.203.17;4564980;1141245;18376
> sdn;10.218.204.17;4571946;1142986;18348
> sdd;10.219.128.17;4591717;1147929;18269
> sdi;10.219.202.17;4505644;1126411;18618
> sdg;10.219.203.17;4562001;1140500;18388
> sdl;10.219.204.17;4583187;1145796;18303
> sde;10.220.128.17;5511568;1377892;15220
> sdh;10.220.202.17;5515555;1378888;15209
> sdj;10.220.203.17;5609983;1402495;14953
> sdm;10.220.204.17;5509035;1377258;15227
>
> Here the ConnectX-3 card is performing perfectly while the Connect-IB
> card still has some room for improvement.
>
> I'd like to get to the bottom of why I'm not seeing the same
> performance out of the newer kernels, but I just don't understand the
> code. I've tried to do what I can in narrowing down where major
> changes happened in the kernel to cause these changes in hopes that it
> would help someone on the list. If there is anything I can do to help
> out, please let me know.
>
> Thank you,
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Fri, Jun 10, 2016 at 3:36 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> I bisected the kernel and it looks like the performance of the
>> Connect-IB card goes down and the performance of the ConnectX-3 card
>> goes up with this commit (but I'm not sure why this would cause this):
>>
>> ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
>> commit ab46db0a3325a064bb24e826b12995d157565efb
>> Author: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Date:   Thu Dec 3 10:06:43 2015 +0100
>>
>>    perf stat: Use perf_evlist__enable in handle_initial_delay
>>
>>    No need to mimic the behaviour of perf_evlist__enable, we can use it
>>    directly.
>>
>>    Signed-off-by: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>    Tested-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>    Cc: Adrian Hunter <adrian.hunter-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>    Cc: David Ahern <dsahern-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>    Cc: Namhyung Kim <namhyung-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>    Cc: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
>>    Link: http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>>    Signed-off-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>
>> :040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
>> b63d9b366f078eabf86f4da3d1cc53ae7434a949 M      tools
>>
>> 4.4.0_rc2_3e27c920
>> sdc;10.218.128.17;5291495;1322873;15853
>> sde;10.218.202.17;4966024;1241506;16892
>> sdh;10.218.203.17;4980471;1245117;16843
>> sdk;10.218.204.17;4966612;1241653;16890
>> sdd;10.219.128.17;5060084;1265021;16578
>> sdf;10.219.202.17;5065278;1266319;16561
>> sdi;10.219.203.17;5047600;1261900;16619
>> sdl;10.219.204.17;5036992;1259248;16654
>> sdn;10.220.128.17;3775081;943770;22221
>> sdg;10.220.202.17;3758336;939584;22320
>> sdj;10.220.203.17;3792832;948208;22117
>> sdm;10.220.204.17;3771516;942879;22242
>>
>> 4.4.0_rc2_ab46db0a
>> sdc;10.218.128.17;3792146;948036;22121
>> sdf;10.218.202.17;3738405;934601;22439
>> sdj;10.218.203.17;3764239;941059;22285
>> sdl;10.218.204.17;3785302;946325;22161
>> sdd;10.219.128.17;3762382;940595;22296
>> sdg;10.219.202.17;3765760;941440;22276
>> sdi;10.219.203.17;3873751;968437;21655
>> sdm;10.219.204.17;3769483;942370;22254
>> sde;10.220.128.17;5022517;1255629;16702
>> sdh;10.220.202.17;5018911;1254727;16714
>> sdk;10.220.203.17;5037295;1259323;16653
>> sdn;10.220.204.17;5033064;1258266;16667
>>
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>>> With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
>>> about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
>>> drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
>>> kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
>>> MIOPs. I confirmed this morning that the card order was swapped on the
>>> 4.6.0 kernel and it was not different ports of the C-IB performing
>>> differently, but different cards.
>>>
>>> Given the limitations of the PCIe 8x port for the CX3, I think 1.25
>>> MIOPs is about the best we can do there. In summary, the performance
>>> of the C-IB card drops after 4.1.15 and gets progressively worse as
>>> the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
>>> degrades a bit on the 4.6.0 kernel.
>>>
>>> Increasing the IO depth by adding jobs does not improve performance,
>>> it actually decreases performance. Based on an average of 4 runs at
>>> each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
>>> difference in performance is less than 1%.
>>>
>>> Similarly, increasing block request size does not really change the
>>> figures to reach line speed.
>>>
>>> Here is the output of the 4.6.0 kernel with 4M bs:
>>> sdc;10.218.128.17;3354638;819;25006
>>> sdf;10.218.202.17;3376920;824;24841
>>> sdm;10.218.203.17;3367431;822;24911
>>> sdk;10.218.204.17;3378960;824;24826
>>> sde;10.219.128.17;3366350;821;24919
>>> sdl;10.219.202.17;3379641;825;24821
>>> sdg;10.219.203.17;3391254;827;24736
>>> sdn;10.219.204.17;3401706;830;24660
>>> sdd;10.220.128.17;4597505;1122;18246
>>> sdi;10.220.202.17;4594231;1121;18259
>>> sdj;10.220.203.17;4667598;1139;17972
>>> sdh;10.220.204.17;4628197;1129;18125
>>>
>>> The CPU on the target is a kworker thread at 96%, but no single
>>> processor over 15%. The initiator has low fio CPU utilization (<10%)
>>> for each job and no single CPU over 22% utilized.
>>>
>>> I have tried manually spreading the IRQ affinity over the processors
>>> of the respective NUMA nodes and there was no noticeable change in
>>> performance when doing so.
>>>
>>> Loading ib_iser on the initiator shows maybe a slight increase in performance:
>>>
>>> sdc;10.218.128.17;3396885;849221;24695
>>> sdf;10.218.202.17;3429240;857310;24462
>>> sdi;10.218.203.17;3454234;863558;24285
>>> sdm;10.218.204.17;3391666;847916;24733
>>> sde;10.219.128.17;3403914;850978;24644
>>> sdh;10.219.202.17;3491034;872758;24029
>>> sdk;10.219.203.17;3390569;847642;24741
>>> sdl;10.219.204.17;3498898;874724;23975
>>> sdd;10.220.128.17;4664743;1166185;17983
>>> sdg;10.220.202.17;4624880;1156220;18138
>>> sdj;10.220.203.17;4616227;1154056;18172
>>> sdn;10.220.204.17;4619786;1154946;18158
>>>
>>> I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
>>> can do that performance and we were limited on the CX3 by the PCIe bus
>>> which isn't an issue with the 16x C-IB card for a single port).
>>> Although the loss of performance in the CX3 card is concerning, I'm
>>> mostly focused on the C-IB card at the moment. I will probably start
>>> bisecting 4.1.15 to 4.4.4 to see if I can identify when the
>>> performance of the C-IB card degrades.
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>>
>>>> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>>>>
>>>>> On the 4.1.15 kernel:
>>>>> sdc;10.218.128.17;3971878;992969;21120
>>>>> sdd;10.218.202.17;3967745;991936;21142
>>>>> sdg;10.218.203.17;3938128;984532;21301
>>>>> sdk;10.218.204.17;3952602;988150;21223
>>>>> sdn;10.219.128.17;4615719;1153929;18174
>>>>> sdf;10.219.202.17;4622331;1155582;18148
>>>>> sdi;10.219.203.17;4602297;1150574;18227
>>>>> sdl;10.219.204.17;4565477;1141369;18374
>>>>> sde;10.220.128.17;4594986;1148746;18256
>>>>> sdh;10.220.202.17;4590209;1147552;18275
>>>>> sdj;10.220.203.17;4599017;1149754;18240
>>>>> sdm;10.220.204.17;4610898;1152724;18193
>>>>>
>>>>> On the 4.6.0 kernel:
>>>>> sdc;10.218.128.17;3239219;809804;25897
>>>>> sdf;10.218.202.17;3321300;830325;25257
>>>>> sdm;10.218.203.17;3339015;834753;25123
>>>>> sdk;10.218.204.17;3637573;909393;23061
>>>>> sde;10.219.128.17;3325777;831444;25223
>>>>> sdl;10.219.202.17;3305464;826366;25378
>>>>> sdg;10.219.203.17;3304032;826008;25389
>>>>> sdn;10.219.204.17;3330001;832500;25191
>>>>> sdd;10.220.128.17;4624370;1156092;18140
>>>>> sdi;10.220.202.17;4619277;1154819;18160
>>>>> sdj;10.220.203.17;4610138;1152534;18196
>>>>> sdh;10.220.204.17;4586445;1146611;18290
>>>>>
>>>>> It seems that there is a lot of changes between the kernels. I had
>>>>> these kernels already on the box and I can bisect them if you think it
>>>>> would help. It is really odd that port 2 on the Connect-IB card did
>>>>> better than port 1 on the 4.6.0 kernel.
>>>>> ----------------
>>>>> Robert LeBlanc
>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> so in these kernels you get better performance with the C-IB than CX3 ?
>>>> we need to find the bottleneck.
>>>> Can you increase the iodepth and/or block size to see if we can reach the
>>>> wire speed.
>>>> another try is to load ib_iser with always_register=N.
>>>>
>>>> what is the cpu utilzation in both initiator/target ?
>>>> did you spread the irq affinity ?
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>>>>> wrote:
>>>>>>
>>>>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
>>>>>> fio as follows:
>>>>>>
>>>>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>>>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>>>>
>>>>>> All of the paths are set the same with noop and nomerges to either 1
>>>>>> or 2 (doesn't make a big difference).
>>>>>>
>>>>>> I started looking into this when the 4.6 kernel wasn't performing as
>>>>>> well as we were able to get the 4.4 kernel to work. I went back to the
>>>>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I started
>>>>>> breaking down the problem to smaller pieces and found this anomaly.
>>>>>> Since there hasn't been any suggestions up to this point, I'll check
>>>>>> other kernel version to see if it is specific to certain kernels. If
>>>>>> you need more information, please let me know.
>>>>>>
>>>>>> Thanks,
>>>>>> ----------------
>>>>>> Robert LeBlanc
>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm trying to understand why our Connect-IB card is not performing as
>>>>>>>> well as our ConnectX-3 card. There are 3 ports between the two cards
>>>>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>>
>>>>>>>> When I run fio against each path individually, I get:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>>>>> Which target do you use ? backing store ?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>>>>
>>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                               ` <3646a0c9-3f2d-66b8-c4da-c91ca1d01cee-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-06-20 21:52                                 ` Robert LeBlanc
  0 siblings, 0 replies; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-20 21:52 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA

I can test with SRP and report back what I find (haven't used SRP in
years so I'll need to brush up on it).
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jun 20, 2016 at 3:27 PM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Did you see this kind of regression in SRP ? or with some other target (e.g
> TGT) ?
> Trying to understand if it's a ULP issue or LLD...
>
>
> On 6/20/2016 6:23 PM, Robert LeBlanc wrote:
>>
>> Adding linux-scsi
>>
>> This last week I tried to figure out where a 10-15% decrease in
>> performance showed up between 4.5 and 4.6 using iSER and ConnectX-3
>> and Connect-IB cards (10.{218,219}.*.17 are Connect-IB and 10.220.*.17
>> are ConnectX-3). To review, straight RDMA transfers between cards
>> showed line rate was being achieved, just iSER was not able to achieve
>> those same rates for some cards on different kernels.
>>
>> 4.5 vanilla default config
>> sdc;10.218.128.17;3800048;950012;22075
>> sdi;10.218.202.17;3757158;939289;22327
>> sdg;10.218.203.17;3774062;943515;22227
>> sdn;10.218.204.17;3816299;954074;21981
>> sdd;10.219.128.17;3821863;955465;21949
>> sdf;10.219.202.17;3784106;946026;22168
>> sdj;10.219.203.17;3827094;956773;21919
>> sdm;10.219.204.17;3788208;947052;22144
>> sde;10.220.128.17;5054596;1263649;16596
>> sdh;10.220.202.17;5013811;1253452;16731
>> sdl;10.220.203.17;5052160;1263040;16604
>> sdk;10.220.204.17;4990248;1247562;16810
>>
>> 4.6 vanilla default config
>> sde;10.218.128.17;3431063;857765;24449
>> sdf;10.218.202.17;3360685;840171;24961
>> sdi;10.218.203.17;3355174;838793;25002
>> sdm;10.218.204.17;3360955;840238;24959
>> sdd;10.219.128.17;3337288;834322;25136
>> sdh;10.219.202.17;3327492;831873;25210
>> sdj;10.219.203.17;3380867;845216;24812
>> sdk;10.219.204.17;3418340;854585;24540
>> sdc;10.220.128.17;4668377;1167094;17969
>> sdg;10.220.202.17;4716675;1179168;17785
>> sdl;10.220.203.17;4675663;1168915;17941
>> sdn;10.220.204.17;4631519;1157879;18112
>>
>> I narrowed the performance degradation to this series
>> 7861728..5e47f19, but while trying to bisect it, the changes were
>> erratic between each commit that I could not figure out exactly which
>> introduced the issue. If someone could give me some pointers on what
>> to do, I can keep trying to dig through this.
>>
>> 4.5.0_rc5_7861728d_00001
>> sdc;10.218.128.17;3747591;936897;22384
>> sdf;10.218.202.17;3750607;937651;22366
>> sdh;10.218.203.17;3750439;937609;22367
>> sdn;10.218.204.17;3771008;942752;22245
>> sde;10.219.128.17;3867678;966919;21689
>> sdg;10.219.202.17;3781889;945472;22181
>> sdk;10.219.203.17;3791804;947951;22123
>> sdl;10.219.204.17;3795406;948851;22102
>> sdd;10.220.128.17;5039110;1259777;16647
>> sdi;10.220.202.17;4992921;1248230;16801
>> sdj;10.220.203.17;5015610;1253902;16725
>> sdm;10.220.204.17;5087087;1271771;16490
>>
>> 4.5.0_rc5_f81bf458_00018
>> sdb;10.218.128.17;5023720;1255930;16698
>> sde;10.218.202.17;5016809;1254202;16721
>> sdj;10.218.203.17;5021915;1255478;16704
>> sdk;10.218.204.17;5021314;1255328;16706
>> sdc;10.219.128.17;4984318;1246079;16830
>> sdf;10.219.202.17;4986096;1246524;16824
>> sdh;10.219.203.17;5043958;1260989;16631
>> sdm;10.219.204.17;5032460;1258115;16669
>> sdd;10.220.128.17;3736740;934185;22449
>> sdg;10.220.202.17;3728767;932191;22497
>> sdi;10.220.203.17;3752117;938029;22357
>> sdl;10.220.204.17;3763901;940975;22287
>>
>> 4.5.0_rc5_07b63196_00027
>> sdb;10.218.128.17;3606142;901535;23262
>> sdg;10.218.202.17;3570988;892747;23491
>> sdf;10.218.203.17;3576011;894002;23458
>> sdk;10.218.204.17;3558113;889528;23576
>> sdc;10.219.128.17;3577384;894346;23449
>> sde;10.219.202.17;3575401;893850;23462
>> sdj;10.219.203.17;3567798;891949;23512
>> sdl;10.219.204.17;3584262;896065;23404
>> sdd;10.220.128.17;4430680;1107670;18933
>> sdh;10.220.202.17;4488286;1122071;18690
>> sdi;10.220.203.17;4487326;1121831;18694
>> sdm;10.220.204.17;4441236;1110309;18888
>>
>> 4.5.0_rc5_5e47f198_00036
>> sdb;10.218.128.17;3519597;879899;23834
>> sdi;10.218.202.17;3512229;878057;23884
>> sdh;10.218.203.17;3518563;879640;23841
>> sdk;10.218.204.17;3582119;895529;23418
>> sdd;10.219.128.17;3550883;887720;23624
>> sdj;10.219.202.17;3558415;889603;23574
>> sde;10.219.203.17;3552086;888021;23616
>> sdl;10.219.204.17;3579521;894880;23435
>> sdc;10.220.128.17;4532912;1133228;18506
>> sdf;10.220.202.17;4558035;1139508;18404
>> sdg;10.220.203.17;4601035;1150258;18232
>> sdm;10.220.204.17;4548150;1137037;18444
>>
>> While bisecting the kernel, I also stumbled across one that worked
>> really well for both adapters which I haven't seen in the release
>> kernels.
>>
>> 4.5.0_rc3_1aaa57f5_00399
>> sdc;10.218.128.17;4627942;1156985;18126
>> sdf;10.218.202.17;4590963;1147740;18272
>> sdk;10.218.203.17;4564980;1141245;18376
>> sdn;10.218.204.17;4571946;1142986;18348
>> sdd;10.219.128.17;4591717;1147929;18269
>> sdi;10.219.202.17;4505644;1126411;18618
>> sdg;10.219.203.17;4562001;1140500;18388
>> sdl;10.219.204.17;4583187;1145796;18303
>> sde;10.220.128.17;5511568;1377892;15220
>> sdh;10.220.202.17;5515555;1378888;15209
>> sdj;10.220.203.17;5609983;1402495;14953
>> sdm;10.220.204.17;5509035;1377258;15227
>>
>> Here the ConnectX-3 card is performing perfectly while the Connect-IB
>> card still has some room for improvement.
>>
>> I'd like to get to the bottom of why I'm not seeing the same
>> performance out of the newer kernels, but I just don't understand the
>> code. I've tried to do what I can in narrowing down where major
>> changes happened in the kernel to cause these changes in hopes that it
>> would help someone on the list. If there is anything I can do to help
>> out, please let me know.
>>
>> Thank you,
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Fri, Jun 10, 2016 at 3:36 PM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>> wrote:
>>>
>>> I bisected the kernel and it looks like the performance of the
>>> Connect-IB card goes down and the performance of the ConnectX-3 card
>>> goes up with this commit (but I'm not sure why this would cause this):
>>>
>>> ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
>>> commit ab46db0a3325a064bb24e826b12995d157565efb
>>> Author: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>> Date:   Thu Dec 3 10:06:43 2015 +0100
>>>
>>>    perf stat: Use perf_evlist__enable in handle_initial_delay
>>>
>>>    No need to mimic the behaviour of perf_evlist__enable, we can use it
>>>    directly.
>>>
>>>    Signed-off-by: Jiri Olsa <jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>>    Tested-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>    Cc: Adrian Hunter <adrian.hunter-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>>    Cc: David Ahern <dsahern-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>    Cc: Namhyung Kim <namhyung-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>>    Cc: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
>>>    Link:
>>> http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>>>    Signed-off-by: Arnaldo Carvalho de Melo <acme-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>>
>>> :040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
>>> b63d9b366f078eabf86f4da3d1cc53ae7434a949 M      tools
>>>
>>> 4.4.0_rc2_3e27c920
>>> sdc;10.218.128.17;5291495;1322873;15853
>>> sde;10.218.202.17;4966024;1241506;16892
>>> sdh;10.218.203.17;4980471;1245117;16843
>>> sdk;10.218.204.17;4966612;1241653;16890
>>> sdd;10.219.128.17;5060084;1265021;16578
>>> sdf;10.219.202.17;5065278;1266319;16561
>>> sdi;10.219.203.17;5047600;1261900;16619
>>> sdl;10.219.204.17;5036992;1259248;16654
>>> sdn;10.220.128.17;3775081;943770;22221
>>> sdg;10.220.202.17;3758336;939584;22320
>>> sdj;10.220.203.17;3792832;948208;22117
>>> sdm;10.220.204.17;3771516;942879;22242
>>>
>>> 4.4.0_rc2_ab46db0a
>>> sdc;10.218.128.17;3792146;948036;22121
>>> sdf;10.218.202.17;3738405;934601;22439
>>> sdj;10.218.203.17;3764239;941059;22285
>>> sdl;10.218.204.17;3785302;946325;22161
>>> sdd;10.219.128.17;3762382;940595;22296
>>> sdg;10.219.202.17;3765760;941440;22276
>>> sdi;10.219.203.17;3873751;968437;21655
>>> sdm;10.219.204.17;3769483;942370;22254
>>> sde;10.220.128.17;5022517;1255629;16702
>>> sdh;10.220.202.17;5018911;1254727;16714
>>> sdk;10.220.203.17;5037295;1259323;16653
>>> sdn;10.220.204.17;5033064;1258266;16667
>>>
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>>> wrote:
>>>>
>>>> With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
>>>> about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
>>>> drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
>>>> kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
>>>> MIOPs. I confirmed this morning that the card order was swapped on the
>>>> 4.6.0 kernel and it was not different ports of the C-IB performing
>>>> differently, but different cards.
>>>>
>>>> Given the limitations of the PCIe 8x port for the CX3, I think 1.25
>>>> MIOPs is about the best we can do there. In summary, the performance
>>>> of the C-IB card drops after 4.1.15 and gets progressively worse as
>>>> the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
>>>> degrades a bit on the 4.6.0 kernel.
>>>>
>>>> Increasing the IO depth by adding jobs does not improve performance,
>>>> it actually decreases performance. Based on an average of 4 runs at
>>>> each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
>>>> difference in performance is less than 1%.
>>>>
>>>> Similarly, increasing block request size does not really change the
>>>> figures to reach line speed.
>>>>
>>>> Here is the output of the 4.6.0 kernel with 4M bs:
>>>> sdc;10.218.128.17;3354638;819;25006
>>>> sdf;10.218.202.17;3376920;824;24841
>>>> sdm;10.218.203.17;3367431;822;24911
>>>> sdk;10.218.204.17;3378960;824;24826
>>>> sde;10.219.128.17;3366350;821;24919
>>>> sdl;10.219.202.17;3379641;825;24821
>>>> sdg;10.219.203.17;3391254;827;24736
>>>> sdn;10.219.204.17;3401706;830;24660
>>>> sdd;10.220.128.17;4597505;1122;18246
>>>> sdi;10.220.202.17;4594231;1121;18259
>>>> sdj;10.220.203.17;4667598;1139;17972
>>>> sdh;10.220.204.17;4628197;1129;18125
>>>>
>>>> The CPU on the target is a kworker thread at 96%, but no single
>>>> processor over 15%. The initiator has low fio CPU utilization (<10%)
>>>> for each job and no single CPU over 22% utilized.
>>>>
>>>> I have tried manually spreading the IRQ affinity over the processors
>>>> of the respective NUMA nodes and there was no noticeable change in
>>>> performance when doing so.
>>>>
>>>> Loading ib_iser on the initiator shows maybe a slight increase in
>>>> performance:
>>>>
>>>> sdc;10.218.128.17;3396885;849221;24695
>>>> sdf;10.218.202.17;3429240;857310;24462
>>>> sdi;10.218.203.17;3454234;863558;24285
>>>> sdm;10.218.204.17;3391666;847916;24733
>>>> sde;10.219.128.17;3403914;850978;24644
>>>> sdh;10.219.202.17;3491034;872758;24029
>>>> sdk;10.219.203.17;3390569;847642;24741
>>>> sdl;10.219.204.17;3498898;874724;23975
>>>> sdd;10.220.128.17;4664743;1166185;17983
>>>> sdg;10.220.202.17;4624880;1156220;18138
>>>> sdj;10.220.203.17;4616227;1154056;18172
>>>> sdn;10.220.204.17;4619786;1154946;18158
>>>>
>>>> I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
>>>> can do that performance and we were limited on the CX3 by the PCIe bus
>>>> which isn't an issue with the 16x C-IB card for a single port).
>>>> Although the loss of performance in the CX3 card is concerning, I'm
>>>> mostly focused on the C-IB card at the moment. I will probably start
>>>> bisecting 4.1.15 to 4.4.4 to see if I can identify when the
>>>> performance of the C-IB card degrades.
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>>>>>
>>>>>>
>>>>>> On the 4.1.15 kernel:
>>>>>> sdc;10.218.128.17;3971878;992969;21120
>>>>>> sdd;10.218.202.17;3967745;991936;21142
>>>>>> sdg;10.218.203.17;3938128;984532;21301
>>>>>> sdk;10.218.204.17;3952602;988150;21223
>>>>>> sdn;10.219.128.17;4615719;1153929;18174
>>>>>> sdf;10.219.202.17;4622331;1155582;18148
>>>>>> sdi;10.219.203.17;4602297;1150574;18227
>>>>>> sdl;10.219.204.17;4565477;1141369;18374
>>>>>> sde;10.220.128.17;4594986;1148746;18256
>>>>>> sdh;10.220.202.17;4590209;1147552;18275
>>>>>> sdj;10.220.203.17;4599017;1149754;18240
>>>>>> sdm;10.220.204.17;4610898;1152724;18193
>>>>>>
>>>>>> On the 4.6.0 kernel:
>>>>>> sdc;10.218.128.17;3239219;809804;25897
>>>>>> sdf;10.218.202.17;3321300;830325;25257
>>>>>> sdm;10.218.203.17;3339015;834753;25123
>>>>>> sdk;10.218.204.17;3637573;909393;23061
>>>>>> sde;10.219.128.17;3325777;831444;25223
>>>>>> sdl;10.219.202.17;3305464;826366;25378
>>>>>> sdg;10.219.203.17;3304032;826008;25389
>>>>>> sdn;10.219.204.17;3330001;832500;25191
>>>>>> sdd;10.220.128.17;4624370;1156092;18140
>>>>>> sdi;10.220.202.17;4619277;1154819;18160
>>>>>> sdj;10.220.203.17;4610138;1152534;18196
>>>>>> sdh;10.220.204.17;4586445;1146611;18290
>>>>>>
>>>>>> It seems that there is a lot of changes between the kernels. I had
>>>>>> these kernels already on the box and I can bisect them if you think it
>>>>>> would help. It is really odd that port 2 on the Connect-IB card did
>>>>>> better than port 1 on the 4.6.0 kernel.
>>>>>> ----------------
>>>>>> Robert LeBlanc
>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>
>>>>>
>>>>>
>>>>> so in these kernels you get better performance with the C-IB than CX3 ?
>>>>> we need to find the bottleneck.
>>>>> Can you increase the iodepth and/or block size to see if we can reach
>>>>> the
>>>>> wire speed.
>>>>> another try is to load ib_iser with always_register=N.
>>>>>
>>>>> what is the cpu utilzation in both initiator/target ?
>>>>> did you spread the irq affinity ?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm
>>>>>>> running
>>>>>>> fio as follows:
>>>>>>>
>>>>>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>>>>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>>>>>
>>>>>>> All of the paths are set the same with noop and nomerges to either 1
>>>>>>> or 2 (doesn't make a big difference).
>>>>>>>
>>>>>>> I started looking into this when the 4.6 kernel wasn't performing as
>>>>>>> well as we were able to get the 4.4 kernel to work. I went back to
>>>>>>> the
>>>>>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I
>>>>>>> started
>>>>>>> breaking down the problem to smaller pieces and found this anomaly.
>>>>>>> Since there hasn't been any suggestions up to this point, I'll check
>>>>>>> other kernel version to see if it is specific to certain kernels. If
>>>>>>> you need more information, please let me know.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> ----------------
>>>>>>> Robert LeBlanc
>>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm trying to understand why our Connect-IB card is not performing
>>>>>>>>> as
>>>>>>>>> well as our ConnectX-3 card. There are 3 ports between the two
>>>>>>>>> cards
>>>>>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When I run fio against each path individually, I get:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>>>>>> Which target do you use ? backing store ?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>>>>>
>>>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                           ` <CAANLjFqoV-5HK0c+LdEbuxd81Vm=g=WE3cQgp47dH-yfYjZjGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-06-20 21:27                             ` Max Gurtovoy
@ 2016-06-21 13:08                             ` Sagi Grimberg
       [not found]                               ` <57693C6A.3020805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2016-06-21 13:08 UTC (permalink / raw)
  To: Robert LeBlanc, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA
  Cc: Max Gurtovoy

Hey Robert,

> I narrowed the performance degradation to this series
> 7861728..5e47f19, but while trying to bisect it, the changes were
> erratic between each commit that I could not figure out exactly which
> introduced the issue. If someone could give me some pointers on what
> to do, I can keep trying to dig through this.

This bisection brings suspects:

e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct 
iser_tx_desc
d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
9679cc51eb13 iser-target: Convert to new CQ API
5adabdd122e4 iser-target: Split and properly type the login buffer
ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
69c48846f1c7 iser-target: Remove redundant wait in release_conn
6d1fba0c2cc7 iser-target: Rework connection termination
f81bf458208e iser-target: Separate flows for np listeners and 
connections cma events
aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
b89a7c25462b iser-target: Fix identification of login rx descriptor type

However I don't really see performance implications in these patches,
not to mention something that would affect on ConnectIB...

Given that your bisection brings up target side patches, I have
a couple questions:

1. Are the CPU usage in the target side at 100%, or the initiator side
is the bottleneck?

2. Would it be possible to use another target implementation? TGT maybe?

3. Can you try testing right before 9679cc51eb13? This is a patch that
involves data-plane.

4. Can you try the latest upstream kernel? The iser target code uses
a generic data-transfer library and I'm interested in knowing what is
the status there.

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                               ` <57693C6A.3020805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-06-21 14:50                                 ` Robert LeBlanc
       [not found]                                   ` <CAANLjFpUyAYB+ZzMwFKBpa4yLmALPzcRGJX1kExVrLARZmZRkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-21 14:50 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

Sagi,

I'm working to implement SRP (I think I got it all working) to test
some of the commits. I can try TGT afterwards and the commit you
mention. I haven't been watching the CPU lately, but before when I was
doing a lot of testing, there wasn't any one thread that was at 100%.
There are several threads that have high utilization, but none 100%
and there is plenty of CPU capacity available (32 cores). I can
capture some of that data if it is helpful. I did test 4.7_rc3 on
Friday, but it didn't change much, is that "new" enough?

4.7.0_rc3_5edb5649
sdc;10.218.128.17;3260244;815061;25730
sdg;10.218.202.17;3405988;851497;24629
sdh;10.218.203.17;3307419;826854;25363
sdm;10.218.204.17;3430502;857625;24453
sdi;10.219.128.17;3544282;886070;23668
sdj;10.219.202.17;3412083;853020;24585
sdk;10.219.203.17;3422385;855596;24511
sdl;10.219.204.17;3444164;861041;24356
sdb;10.220.128.17;4803646;1200911;17463
sdd;10.220.202.17;4832982;1208245;17357
sde;10.220.203.17;4809430;1202357;17442
sdf;10.220.204.17;4808878;1202219;17444

Thanks for the suggestions, I'll work to get some of the requested
data back to you guys quickly.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 21, 2016 at 7:08 AM, Sagi Grimberg <sagigrim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hey Robert,
>
>> I narrowed the performance degradation to this series
>> 7861728..5e47f19, but while trying to bisect it, the changes were
>> erratic between each commit that I could not figure out exactly which
>> introduced the issue. If someone could give me some pointers on what
>> to do, I can keep trying to dig through this.
>
>
> This bisection brings suspects:
>
> e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct
> iser_tx_desc
> d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
> 9679cc51eb13 iser-target: Convert to new CQ API
> 5adabdd122e4 iser-target: Split and properly type the login buffer
> ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
> 26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
> 69c48846f1c7 iser-target: Remove redundant wait in release_conn
> 6d1fba0c2cc7 iser-target: Rework connection termination
> f81bf458208e iser-target: Separate flows for np listeners and connections
> cma events
> aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
> b89a7c25462b iser-target: Fix identification of login rx descriptor type
>
> However I don't really see performance implications in these patches,
> not to mention something that would affect on ConnectIB...
>
> Given that your bisection brings up target side patches, I have
> a couple questions:
>
> 1. Are the CPU usage in the target side at 100%, or the initiator side
> is the bottleneck?
>
> 2. Would it be possible to use another target implementation? TGT maybe?
>
> 3. Can you try testing right before 9679cc51eb13? This is a patch that
> involves data-plane.
>
> 4. Can you try the latest upstream kernel? The iser target code uses
> a generic data-transfer library and I'm interested in knowing what is
> the status there.
>
> Cheers,
> Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                   ` <CAANLjFpUyAYB+ZzMwFKBpa4yLmALPzcRGJX1kExVrLARZmZRkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-21 20:26                                     ` Robert LeBlanc
       [not found]                                       ` <CAANLjFpeL0AkuGW-q5Bmm-dff0UqFOM_sAOaG7=vyqmwnOoTcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-06-22 16:21                                       ` Sagi Grimberg
  0 siblings, 2 replies; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-21 20:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

Sagi & Max,

Here is the results of SRP using the same ramdisk backstore that I was
using from iSER (as same as can be between reboots and restoring
targetcli config). I also tested the commit before 9679cc51eb13
(5adabdd122e471fe978d49471624bab08b5373a7) which is included here. I'm
not seeing a correlation between iSER and SRP that would lead me to
believe that the changes are happening in both implementations.

Does this provide enough information for you, or do you think TGT will
be needed?

4.4 (afd2ff9) vanilla default config
sdb;10.218.128.17;5150176;1287544;16288
sdd;10.218.202.17;5092337;1273084;16473
sdh;10.218.203.17;5129078;1282269;16355
sdk;10.218.204.17;5129078;1282269;16355
sdg;10.219.128.17;5155874;1288968;16270
sdf;10.219.202.17;5131588;1282897;16347
sdi;10.219.203.17;5165399;1291349;16240
sdl;10.219.204.17;5157459;1289364;16265
sdc;10.220.128.17;3684223;921055;22769
sde;10.220.202.17;3692169;923042;22720
sdj;10.220.203.17;3699170;924792;22677
Sdm;10.220.204.17;3697865;924466;22685

mlx5_0;sde;2968368;742092;28260
mlx4_0;sdd;3325645;831411;25224
mlx5_0;sdc;3023466;755866;27745

4.4.0_rc2_3e27c920
sdc;10.218.128.17;5291495;1322873;15853
sde;10.218.202.17;4966024;1241506;16892
sdh;10.218.203.17;4980471;1245117;16843
sdk;10.218.204.17;4966612;1241653;16890
sdd;10.219.128.17;5060084;1265021;16578
sdf;10.219.202.17;5065278;1266319;16561
sdi;10.219.203.17;5047600;1261900;16619
sdl;10.219.204.17;5036992;1259248;16654
sdn;10.220.128.17;3775081;943770;22221
sdg;10.220.202.17;3758336;939584;22320
sdj;10.220.203.17;3792832;948208;22117
Sdm;10.220.204.17;3771516;942879;22242

Mlx4_0;sde;4648715;1162178;18045  ~73% cpu ib_srpt_compl
Mlx5_0;sdd;3476566;869141;24129 ~80% cpu ib_srpt_compl
mlx5_0;sdc;3492343;873085;24020

4.4.0_rc2_ab46db0a
sdc;10.218.128.17;3792146;948036;22121
sdf;10.218.202.17;3738405;934601;22439
sdj;10.218.203.17;3764239;941059;22285
sdl;10.218.204.17;3785302;946325;22161
sdd;10.219.128.17;3762382;940595;22296
sdg;10.219.202.17;3765760;941440;22276
sdi;10.219.203.17;3873751;968437;21655
sdm;10.219.204.17;3769483;942370;22254
sde;10.220.128.17;5022517;1255629;16702
sdh;10.220.202.17;5018911;1254727;16714
sdk;10.220.203.17;5037295;1259323;16653
Sdn;10.220.204.17;5033064;1258266;16667

mlx4_0;sde;4635358;1158839;18097
mlx5_0;sdd;3459077;864769;24251
mlx5_0;sdc;3465650;866412;24205

4.5.0_rc3_1aaa57f5_00399

sdc;10.218.128.17;4627942;1156985;18126
sdf;10.218.202.17;4590963;1147740;18272
sdk;10.218.203.17;4564980;1141245;18376
sdn;10.218.204.17;4571946;1142986;18348
sdd;10.219.128.17;4591717;1147929;18269
sdi;10.219.202.17;4505644;1126411;18618
sdg;10.219.203.17;4562001;1140500;18388
sdl;10.219.204.17;4583187;1145796;18303
sde;10.220.128.17;5511568;1377892;15220
sdh;10.220.202.17;5515555;1378888;15209
sdj;10.220.203.17;5609983;1402495;14953
sdm;10.220.204.17;5509035;1377258;15227

Mlx5_0;sde;3593013;898253;23347 100% CPU kworker/u69:2
Mlx5_0;sdd;3588555;897138;23376 100% CPU kworker/u69:2
Mlx4_0;sdc;3525662;881415;23793 100% CPU kworker/u68:0

4.5.0_rc5_7861728d_00001
sdc;10.218.128.17;3747591;936897;22384
sdf;10.218.202.17;3750607;937651;22366
sdh;10.218.203.17;3750439;937609;22367
sdn;10.218.204.17;3771008;942752;22245
sde;10.219.128.17;3867678;966919;21689
sdg;10.219.202.17;3781889;945472;22181
sdk;10.219.203.17;3791804;947951;22123
sdl;10.219.204.17;3795406;948851;22102
sdd;10.220.128.17;5039110;1259777;16647
sdi;10.220.202.17;4992921;1248230;16801
sdj;10.220.203.17;5015610;1253902;16725
Sdm;10.220.204.17;5087087;1271771;16490

Mlx5_0;sde;2930722;732680;28623 ~98% CPU kworker/u69:0
Mlx5_0;sdd;2910891;727722;28818 ~98% CPU kworker/u69:0
Mlx4_0;sdc;3263668;815917;25703 ~98% CPU kworker/u68:0

4.5.0_rc5_f81bf458_00018
sdb;10.218.128.17;5023720;1255930;16698
sde;10.218.202.17;5016809;1254202;16721
sdj;10.218.203.17;5021915;1255478;16704
sdk;10.218.204.17;5021314;1255328;16706
sdc;10.219.128.17;4984318;1246079;16830
sdf;10.219.202.17;4986096;1246524;16824
sdh;10.219.203.17;5043958;1260989;16631
sdm;10.219.204.17;5032460;1258115;16669
sdd;10.220.128.17;3736740;934185;22449
sdg;10.220.202.17;3728767;932191;22497
sdi;10.220.203.17;3752117;938029;22357
Sdl;10.220.204.17;3763901;940975;22287

Srpt keeps crashing couldn't test

4.5.0_rc5_5adabdd1_00023
Sdc;10.218.128.17;3726448;931612;22511 ~97% CPU kworker/u69:4
sdf;10.218.202.17;3750271;937567;22368
sdi;10.218.203.17;3749266;937316;22374
sdj;10.218.204.17;3798844;949711;22082
sde;10.219.128.17;3759852;939963;22311 ~97% CPU kworker/u69:4
sdg;10.219.202.17;3772534;943133;22236
sdl;10.219.203.17;3769483;942370;22254
sdn;10.219.204.17;3790604;947651;22130
sdd;10.220.128.17;5171130;1292782;16222 ~96% CPU kworker/u68:3
sdh;10.220.202.17;5105354;1276338;16431
sdk;10.220.203.17;4995300;1248825;16793
sdm;10.220.204.17;4959564;1239891;16914

Srpt crashes

4.5.0_rc5_07b63196_00027
sdb;10.218.128.17;3606142;901535;23262
sdg;10.218.202.17;3570988;892747;23491
sdf;10.218.203.17;3576011;894002;23458
sdk;10.218.204.17;3558113;889528;23576
sdc;10.219.128.17;3577384;894346;23449
sde;10.219.202.17;3575401;893850;23462
sdj;10.219.203.17;3567798;891949;23512
sdl;10.219.204.17;3584262;896065;23404
sdd;10.220.128.17;4430680;1107670;18933
sdh;10.220.202.17;4488286;1122071;18690
sdi;10.220.203.17;4487326;1121831;18694
sdm;10.220.204.17;4441236;1110309;18888

Srpt crashes

4.5.0_rc5_5e47f198_00036
sdb;10.218.128.17;3519597;879899;23834
sdi;10.218.202.17;3512229;878057;23884
sdh;10.218.203.17;3518563;879640;23841
sdk;10.218.204.17;3582119;895529;23418
sdd;10.219.128.17;3550883;887720;23624
sdj;10.219.202.17;3558415;889603;23574
sde;10.219.203.17;3552086;888021;23616
sdl;10.219.204.17;3579521;894880;23435
sdc;10.220.128.17;4532912;1133228;18506
sdf;10.220.202.17;4558035;1139508;18404
sdg;10.220.203.17;4601035;1150258;18232
sdm;10.220.204.17;4548150;1137037;18444

srpt crashes

4.6.2 vanilla default config
sde;10.218.128.17;3431063;857765;24449
sdf;10.218.202.17;3360685;840171;24961
sdi;10.218.203.17;3355174;838793;25002
sdm;10.218.204.17;3360955;840238;24959
sdd;10.219.128.17;3337288;834322;25136
sdh;10.219.202.17;3327492;831873;25210
sdj;10.219.203.17;3380867;845216;24812
sdk;10.219.204.17;3418340;854585;24540
sdc;10.220.128.17;4668377;1167094;17969
sdg;10.220.202.17;4716675;1179168;17785
sdl;10.220.203.17;4675663;1168915;17941
sdn;10.220.204.17;4631519;1157879;18112

Mlx5_0;sde;3390021;847505;24745 ~98% CPU kworker/u69:3
Mlx5_0;sdd;3207512;801878;26153 ~98% CPU kworker/u69:3
Mlx4_0;sdc;2998072;749518;27980 ~98% CPU kworker/u68:0

4.7.0_rc3_5edb5649
sdc;10.218.128.17;3260244;815061;25730
sdg;10.218.202.17;3405988;851497;24629
sdh;10.218.203.17;3307419;826854;25363
sdm;10.218.204.17;3430502;857625;24453
sdi;10.219.128.17;3544282;886070;23668
sdj;10.219.202.17;3412083;853020;24585
sdk;10.219.203.17;3422385;855596;24511
sdl;10.219.204.17;3444164;861041;24356
sdb;10.220.128.17;4803646;1200911;17463
sdd;10.220.202.17;4832982;1208245;17357
sde;10.220.203.17;4809430;1202357;17442
sdf;10.220.204.17;4808878;1202219;17444

mlx5_0;sdd;2986864;746716;28085
mlx5_0;sdc;2963648;740912;28305
mlx4_0;sdb;3317228;829307;25288

Thanks,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Jun 21, 2016 at 8:50 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> Sagi,
>
> I'm working to implement SRP (I think I got it all working) to test
> some of the commits. I can try TGT afterwards and the commit you
> mention. I haven't been watching the CPU lately, but before when I was
> doing a lot of testing, there wasn't any one thread that was at 100%.
> There are several threads that have high utilization, but none 100%
> and there is plenty of CPU capacity available (32 cores). I can
> capture some of that data if it is helpful. I did test 4.7_rc3 on
> Friday, but it didn't change much, is that "new" enough?
>
> 4.7.0_rc3_5edb5649
> sdc;10.218.128.17;3260244;815061;25730
> sdg;10.218.202.17;3405988;851497;24629
> sdh;10.218.203.17;3307419;826854;25363
> sdm;10.218.204.17;3430502;857625;24453
> sdi;10.219.128.17;3544282;886070;23668
> sdj;10.219.202.17;3412083;853020;24585
> sdk;10.219.203.17;3422385;855596;24511
> sdl;10.219.204.17;3444164;861041;24356
> sdb;10.220.128.17;4803646;1200911;17463
> sdd;10.220.202.17;4832982;1208245;17357
> sde;10.220.203.17;4809430;1202357;17442
> sdf;10.220.204.17;4808878;1202219;17444
>
> Thanks for the suggestions, I'll work to get some of the requested
> data back to you guys quickly.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Jun 21, 2016 at 7:08 AM, Sagi Grimberg <sagigrim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hey Robert,
>>
>>> I narrowed the performance degradation to this series
>>> 7861728..5e47f19, but while trying to bisect it, the changes were
>>> erratic between each commit that I could not figure out exactly which
>>> introduced the issue. If someone could give me some pointers on what
>>> to do, I can keep trying to dig through this.
>>
>>
>> This bisection brings suspects:
>>
>> e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct
>> iser_tx_desc
>> d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
>> 9679cc51eb13 iser-target: Convert to new CQ API
>> 5adabdd122e4 iser-target: Split and properly type the login buffer
>> ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
>> 26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
>> 69c48846f1c7 iser-target: Remove redundant wait in release_conn
>> 6d1fba0c2cc7 iser-target: Rework connection termination
>> f81bf458208e iser-target: Separate flows for np listeners and connections
>> cma events
>> aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
>> b89a7c25462b iser-target: Fix identification of login rx descriptor type
>>
>> However I don't really see performance implications in these patches,
>> not to mention something that would affect on ConnectIB...
>>
>> Given that your bisection brings up target side patches, I have
>> a couple questions:
>>
>> 1. Are the CPU usage in the target side at 100%, or the initiator side
>> is the bottleneck?
>>
>> 2. Would it be possible to use another target implementation? TGT maybe?
>>
>> 3. Can you try testing right before 9679cc51eb13? This is a patch that
>> involves data-plane.
>>
>> 4. Can you try the latest upstream kernel? The iser target code uses
>> a generic data-transfer library and I'm interested in knowing what is
>> the status there.
>>
>> Cheers,
>> Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                       ` <CAANLjFpeL0AkuGW-q5Bmm-dff0UqFOM_sAOaG7=vyqmwnOoTcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-22  8:18                                         ` Bart Van Assche
       [not found]                                           ` <86d4404a-fa6a-72de-8e83-827072c308b5-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2016-06-22  9:52                                         ` Sagi Grimberg
  1 sibling, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2016-06-22  8:18 UTC (permalink / raw)
  To: Robert LeBlanc, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

On 06/21/2016 10:26 PM, Robert LeBlanc wrote:
> Srpt keeps crashing couldn't test

If this is reproducible with the latest rc kernel or with any of the 
stable kernels please report this in a separate e-mail, together with 
the crash call stack and information about how to reproduce this.

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                       ` <CAANLjFpeL0AkuGW-q5Bmm-dff0UqFOM_sAOaG7=vyqmwnOoTcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-06-22  8:18                                         ` Bart Van Assche
@ 2016-06-22  9:52                                         ` Sagi Grimberg
  1 sibling, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2016-06-22  9:52 UTC (permalink / raw)
  To: Robert LeBlanc
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

> Sagi & Max,
>
> Here is the results of SRP using the same ramdisk backstore that I was
> using from iSER (as same as can be between reboots and restoring
> targetcli config). I also tested the commit before 9679cc51eb13
> (5adabdd122e471fe978d49471624bab08b5373a7) which is included here. I'm
> not seeing a correlation between iSER and SRP that would lead me to
> believe that the changes are happening in both implementations.
>
> Does this provide enough information for you, or do you think TGT will
> be needed?

I'm a little lost on which test belongs to what, can you specify that
more clearly?

>
> 4.4 (afd2ff9) vanilla default config
> sdb;10.218.128.17;5150176;1287544;16288
> sdd;10.218.202.17;5092337;1273084;16473
> sdh;10.218.203.17;5129078;1282269;16355
> sdk;10.218.204.17;5129078;1282269;16355
> sdg;10.219.128.17;5155874;1288968;16270
> sdf;10.219.202.17;5131588;1282897;16347
> sdi;10.219.203.17;5165399;1291349;16240
> sdl;10.219.204.17;5157459;1289364;16265
> sdc;10.220.128.17;3684223;921055;22769
> sde;10.220.202.17;3692169;923042;22720
> sdj;10.220.203.17;3699170;924792;22677
> Sdm;10.220.204.17;3697865;924466;22685
>
> mlx5_0;sde;2968368;742092;28260
> mlx4_0;sdd;3325645;831411;25224
> mlx5_0;sdc;3023466;755866;27745
>
> 4.4.0_rc2_3e27c920
> sdc;10.218.128.17;5291495;1322873;15853
> sde;10.218.202.17;4966024;1241506;16892
> sdh;10.218.203.17;4980471;1245117;16843
> sdk;10.218.204.17;4966612;1241653;16890
> sdd;10.219.128.17;5060084;1265021;16578
> sdf;10.219.202.17;5065278;1266319;16561
> sdi;10.219.203.17;5047600;1261900;16619
> sdl;10.219.204.17;5036992;1259248;16654
> sdn;10.220.128.17;3775081;943770;22221
> sdg;10.220.202.17;3758336;939584;22320
> sdj;10.220.203.17;3792832;948208;22117
> Sdm;10.220.204.17;3771516;942879;22242
>
> Mlx4_0;sde;4648715;1162178;18045  ~73% cpu ib_srpt_compl
> Mlx5_0;sdd;3476566;869141;24129 ~80% cpu ib_srpt_compl
> mlx5_0;sdc;3492343;873085;24020
>
> 4.4.0_rc2_ab46db0a
> sdc;10.218.128.17;3792146;948036;22121
> sdf;10.218.202.17;3738405;934601;22439
> sdj;10.218.203.17;3764239;941059;22285
> sdl;10.218.204.17;3785302;946325;22161
> sdd;10.219.128.17;3762382;940595;22296
> sdg;10.219.202.17;3765760;941440;22276
> sdi;10.219.203.17;3873751;968437;21655
> sdm;10.219.204.17;3769483;942370;22254
> sde;10.220.128.17;5022517;1255629;16702
> sdh;10.220.202.17;5018911;1254727;16714
> sdk;10.220.203.17;5037295;1259323;16653
> Sdn;10.220.204.17;5033064;1258266;16667
>
> mlx4_0;sde;4635358;1158839;18097
> mlx5_0;sdd;3459077;864769;24251
> mlx5_0;sdc;3465650;866412;24205
>
> 4.5.0_rc3_1aaa57f5_00399
>
> sdc;10.218.128.17;4627942;1156985;18126
> sdf;10.218.202.17;4590963;1147740;18272
> sdk;10.218.203.17;4564980;1141245;18376
> sdn;10.218.204.17;4571946;1142986;18348
> sdd;10.219.128.17;4591717;1147929;18269
> sdi;10.219.202.17;4505644;1126411;18618
> sdg;10.219.203.17;4562001;1140500;18388
> sdl;10.219.204.17;4583187;1145796;18303
> sde;10.220.128.17;5511568;1377892;15220
> sdh;10.220.202.17;5515555;1378888;15209
> sdj;10.220.203.17;5609983;1402495;14953
> sdm;10.220.204.17;5509035;1377258;15227
>
> Mlx5_0;sde;3593013;898253;23347 100% CPU kworker/u69:2
> Mlx5_0;sdd;3588555;897138;23376 100% CPU kworker/u69:2
> Mlx4_0;sdc;3525662;881415;23793 100% CPU kworker/u68:0
>
> 4.5.0_rc5_7861728d_00001
> sdc;10.218.128.17;3747591;936897;22384
> sdf;10.218.202.17;3750607;937651;22366
> sdh;10.218.203.17;3750439;937609;22367
> sdn;10.218.204.17;3771008;942752;22245
> sde;10.219.128.17;3867678;966919;21689
> sdg;10.219.202.17;3781889;945472;22181
> sdk;10.219.203.17;3791804;947951;22123
> sdl;10.219.204.17;3795406;948851;22102
> sdd;10.220.128.17;5039110;1259777;16647
> sdi;10.220.202.17;4992921;1248230;16801
> sdj;10.220.203.17;5015610;1253902;16725
> Sdm;10.220.204.17;5087087;1271771;16490
>
> Mlx5_0;sde;2930722;732680;28623 ~98% CPU kworker/u69:0
> Mlx5_0;sdd;2910891;727722;28818 ~98% CPU kworker/u69:0
> Mlx4_0;sdc;3263668;815917;25703 ~98% CPU kworker/u68:0
>
> 4.5.0_rc5_f81bf458_00018
> sdb;10.218.128.17;5023720;1255930;16698
> sde;10.218.202.17;5016809;1254202;16721
> sdj;10.218.203.17;5021915;1255478;16704
> sdk;10.218.204.17;5021314;1255328;16706
> sdc;10.219.128.17;4984318;1246079;16830
> sdf;10.219.202.17;4986096;1246524;16824
> sdh;10.219.203.17;5043958;1260989;16631
> sdm;10.219.204.17;5032460;1258115;16669
> sdd;10.220.128.17;3736740;934185;22449
> sdg;10.220.202.17;3728767;932191;22497
> sdi;10.220.203.17;3752117;938029;22357
> Sdl;10.220.204.17;3763901;940975;22287
>
> Srpt keeps crashing couldn't test
>
> 4.5.0_rc5_5adabdd1_00023
> Sdc;10.218.128.17;3726448;931612;22511 ~97% CPU kworker/u69:4
> sdf;10.218.202.17;3750271;937567;22368
> sdi;10.218.203.17;3749266;937316;22374
> sdj;10.218.204.17;3798844;949711;22082
> sde;10.219.128.17;3759852;939963;22311 ~97% CPU kworker/u69:4
> sdg;10.219.202.17;3772534;943133;22236
> sdl;10.219.203.17;3769483;942370;22254
> sdn;10.219.204.17;3790604;947651;22130
> sdd;10.220.128.17;5171130;1292782;16222 ~96% CPU kworker/u68:3
> sdh;10.220.202.17;5105354;1276338;16431
> sdk;10.220.203.17;4995300;1248825;16793
> sdm;10.220.204.17;4959564;1239891;16914
>
> Srpt crashes
>
> 4.5.0_rc5_07b63196_00027
> sdb;10.218.128.17;3606142;901535;23262
> sdg;10.218.202.17;3570988;892747;23491
> sdf;10.218.203.17;3576011;894002;23458
> sdk;10.218.204.17;3558113;889528;23576
> sdc;10.219.128.17;3577384;894346;23449
> sde;10.219.202.17;3575401;893850;23462
> sdj;10.219.203.17;3567798;891949;23512
> sdl;10.219.204.17;3584262;896065;23404
> sdd;10.220.128.17;4430680;1107670;18933
> sdh;10.220.202.17;4488286;1122071;18690
> sdi;10.220.203.17;4487326;1121831;18694
> sdm;10.220.204.17;4441236;1110309;18888
>
> Srpt crashes
>
> 4.5.0_rc5_5e47f198_00036
> sdb;10.218.128.17;3519597;879899;23834
> sdi;10.218.202.17;3512229;878057;23884
> sdh;10.218.203.17;3518563;879640;23841
> sdk;10.218.204.17;3582119;895529;23418
> sdd;10.219.128.17;3550883;887720;23624
> sdj;10.219.202.17;3558415;889603;23574
> sde;10.219.203.17;3552086;888021;23616
> sdl;10.219.204.17;3579521;894880;23435
> sdc;10.220.128.17;4532912;1133228;18506
> sdf;10.220.202.17;4558035;1139508;18404
> sdg;10.220.203.17;4601035;1150258;18232
> sdm;10.220.204.17;4548150;1137037;18444
>
> srpt crashes
>
> 4.6.2 vanilla default config
> sde;10.218.128.17;3431063;857765;24449
> sdf;10.218.202.17;3360685;840171;24961
> sdi;10.218.203.17;3355174;838793;25002
> sdm;10.218.204.17;3360955;840238;24959
> sdd;10.219.128.17;3337288;834322;25136
> sdh;10.219.202.17;3327492;831873;25210
> sdj;10.219.203.17;3380867;845216;24812
> sdk;10.219.204.17;3418340;854585;24540
> sdc;10.220.128.17;4668377;1167094;17969
> sdg;10.220.202.17;4716675;1179168;17785
> sdl;10.220.203.17;4675663;1168915;17941
> sdn;10.220.204.17;4631519;1157879;18112
>
> Mlx5_0;sde;3390021;847505;24745 ~98% CPU kworker/u69:3
> Mlx5_0;sdd;3207512;801878;26153 ~98% CPU kworker/u69:3
> Mlx4_0;sdc;2998072;749518;27980 ~98% CPU kworker/u68:0
>
> 4.7.0_rc3_5edb5649
> sdc;10.218.128.17;3260244;815061;25730
> sdg;10.218.202.17;3405988;851497;24629
> sdh;10.218.203.17;3307419;826854;25363
> sdm;10.218.204.17;3430502;857625;24453
> sdi;10.219.128.17;3544282;886070;23668
> sdj;10.219.202.17;3412083;853020;24585
> sdk;10.219.203.17;3422385;855596;24511
> sdl;10.219.204.17;3444164;861041;24356
> sdb;10.220.128.17;4803646;1200911;17463
> sdd;10.220.202.17;4832982;1208245;17357
> sde;10.220.203.17;4809430;1202357;17442
> sdf;10.220.204.17;4808878;1202219;17444
>
> mlx5_0;sdd;2986864;746716;28085
> mlx5_0;sdc;2963648;740912;28305
> mlx4_0;sdb;3317228;829307;25288
>
> Thanks,
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Jun 21, 2016 at 8:50 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
>> Sagi,
>>
>> I'm working to implement SRP (I think I got it all working) to test
>> some of the commits. I can try TGT afterwards and the commit you
>> mention. I haven't been watching the CPU lately, but before when I was
>> doing a lot of testing, there wasn't any one thread that was at 100%.
>> There are several threads that have high utilization, but none 100%
>> and there is plenty of CPU capacity available (32 cores). I can
>> capture some of that data if it is helpful. I did test 4.7_rc3 on
>> Friday, but it didn't change much, is that "new" enough?
>>
>> 4.7.0_rc3_5edb5649
>> sdc;10.218.128.17;3260244;815061;25730
>> sdg;10.218.202.17;3405988;851497;24629
>> sdh;10.218.203.17;3307419;826854;25363
>> sdm;10.218.204.17;3430502;857625;24453
>> sdi;10.219.128.17;3544282;886070;23668
>> sdj;10.219.202.17;3412083;853020;24585
>> sdk;10.219.203.17;3422385;855596;24511
>> sdl;10.219.204.17;3444164;861041;24356
>> sdb;10.220.128.17;4803646;1200911;17463
>> sdd;10.220.202.17;4832982;1208245;17357
>> sde;10.220.203.17;4809430;1202357;17442
>> sdf;10.220.204.17;4808878;1202219;17444
>>
>> Thanks for the suggestions, I'll work to get some of the requested
>> data back to you guys quickly.
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Tue, Jun 21, 2016 at 7:08 AM, Sagi Grimberg <sagigrim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Hey Robert,
>>>
>>>> I narrowed the performance degradation to this series
>>>> 7861728..5e47f19, but while trying to bisect it, the changes were
>>>> erratic between each commit that I could not figure out exactly which
>>>> introduced the issue. If someone could give me some pointers on what
>>>> to do, I can keep trying to dig through this.
>>>
>>>
>>> This bisection brings suspects:
>>>
>>> e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct
>>> iser_tx_desc
>>> d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
>>> 9679cc51eb13 iser-target: Convert to new CQ API
>>> 5adabdd122e4 iser-target: Split and properly type the login buffer
>>> ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
>>> 26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
>>> 69c48846f1c7 iser-target: Remove redundant wait in release_conn
>>> 6d1fba0c2cc7 iser-target: Rework connection termination
>>> f81bf458208e iser-target: Separate flows for np listeners and connections
>>> cma events
>>> aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
>>> b89a7c25462b iser-target: Fix identification of login rx descriptor type
>>>
>>> However I don't really see performance implications in these patches,
>>> not to mention something that would affect on ConnectIB...
>>>
>>> Given that your bisection brings up target side patches, I have
>>> a couple questions:
>>>
>>> 1. Are the CPU usage in the target side at 100%, or the initiator side
>>> is the bottleneck?
>>>
>>> 2. Would it be possible to use another target implementation? TGT maybe?
>>>
>>> 3. Can you try testing right before 9679cc51eb13? This is a patch that
>>> involves data-plane.
>>>
>>> 4. Can you try the latest upstream kernel? The iser target code uses
>>> a generic data-transfer library and I'm interested in knowing what is
>>> the status there.
>>>
>>> Cheers,
>>> Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                           ` <86d4404a-fa6a-72de-8e83-827072c308b5-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-06-22 12:23                                             ` Laurence Oberman
  2016-06-22 15:45                                             ` Robert LeBlanc
  1 sibling, 0 replies; 20+ messages in thread
From: Laurence Oberman @ 2016-06-22 12:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Robert LeBlanc, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy



----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> To: "Robert LeBlanc" <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>, "Sagi Grimberg" <sagigrim-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Max Gurtovoy" <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Sent: Wednesday, June 22, 2016 4:18:31 AM
> Subject: Re: Connect-IB not performing as well as ConnectX-3 with iSER
> 
> On 06/21/2016 10:26 PM, Robert LeBlanc wrote:
> > Srpt keeps crashing couldn't test
> 
> If this is reproducible with the latest rc kernel or with any of the
> stable kernels please report this in a separate e-mail, together with
> the crash call stack and information about how to reproduce this.
> 
> Thanks,
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Robert

I am exercising the ib_srpt configured vi a targetlio very heavily in 4.7.0-rc1.
I have no crashes or issues.
I also had 4.5 running ib_srpt with no crashes, although I had some other timeouts etc depending on the load.

What sort of crashes are you talking about ?
Does the system crash, ib_srpt dump stack ?

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                           ` <86d4404a-fa6a-72de-8e83-827072c308b5-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2016-06-22 12:23                                             ` Laurence Oberman
@ 2016-06-22 15:45                                             ` Robert LeBlanc
  1 sibling, 0 replies; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-22 15:45 UTC (permalink / raw)
  To: Bart Van Assche, Laurence Oberman
  Cc: Sagi Grimberg, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

There is no need to be concerned about srpt crashing in the latest
kernel. Srpt only crashed when I was testing kernels in that change
set (7861728..5e47f19) that I identified the 10-15% performance drop
in iSER between the 4.5 and 4.6 kernel. My tests from the 4.6 to
4.7rc3 didn't have a problem with srpt crashing.

The format of the output is as follows:

Kernel_tag_commit
iSER tests with results in this format
<dev>;<target IP>;<bandwidth KB/s>;<IOPs>;<execution time ms> (last
three fields are fields 7,8,9 from fio)
i.e. sdc;10.218.128.17;3260244;815061;25730
SRP LIO tests
<IB driver>;<dev>;<bandwidth KB/s>;<IOPs>;<execution time ms>
i.e. mlx5_0;sdd;2986864;746716;28085

This is repeated for each kernel tested. On some tests I also
documented the observed CPU utilization of some of the target
processes. In some cases I was lazy and if the information was the
same for both mlx5 targets, I didn't duplicate it. For iSER, there are
four aliases on each adapter to provide four paths for each IB port
(this is a remnant of some previous multipathing tests, and now only
serves as providing additional data points to know how repeatable the
tests are). 10.218.*.17 and 10.219.*.17 are generally on the mlx5
ports while 19.220.*.17 are on the mlx4 port (some tests had the
adapters swapped, but none of these did and it is easy to identify
them by the grouping).

This test was performed against each path individually. I created an
ext4 filesystem on the device (no partitions), then would mount the
file system on one path, run the test, umount the path, mount the next
path, run the test, etc so that there is no multipathing confusing the
tests. I also am _NOT_ running the tests on all paths at the same time
using fio.

The fio command I'm using is: fio --rw=read --bs=4K --size=2G
--numjobs=40 --name=worker.matt --group_reporting --minimal |  cut
-d';' -f7,8,9

I hope that clears up the confusion, if not, please ask for more clarification.

On Jun 22, 2016 2:18 AM, "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org> wrote:
>
> On 06/21/2016 10:26 PM, Robert LeBlanc wrote:
>>
>> Srpt keeps crashing couldn't test
>
>
> If this is reproducible with the latest rc kernel or with any of the stable kernels please report this in a separate e-mail, together with the crash call stack and information about how to reproduce this.
>
> Thanks,
>
> Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
  2016-06-21 20:26                                     ` Robert LeBlanc
       [not found]                                       ` <CAANLjFpeL0AkuGW-q5Bmm-dff0UqFOM_sAOaG7=vyqmwnOoTcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-22 16:21                                       ` Sagi Grimberg
       [not found]                                         ` <576ABB1B.4020509-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2016-06-22 16:21 UTC (permalink / raw)
  To: Robert LeBlanc, Sagi Grimberg; +Cc: linux-rdma, linux-scsi, Max Gurtovoy

Let me see if I get this correct:

> 4.5.0_rc3_1aaa57f5_00399
>
> sdc;10.218.128.17;4627942;1156985;18126
> sdf;10.218.202.17;4590963;1147740;18272
> sdk;10.218.203.17;4564980;1141245;18376
> sdn;10.218.204.17;4571946;1142986;18348
> sdd;10.219.128.17;4591717;1147929;18269
> sdi;10.219.202.17;4505644;1126411;18618
> sdg;10.219.203.17;4562001;1140500;18388
> sdl;10.219.204.17;4583187;1145796;18303
> sde;10.220.128.17;5511568;1377892;15220
> sdh;10.220.202.17;5515555;1378888;15209
> sdj;10.220.203.17;5609983;1402495;14953
> sdm;10.220.204.17;5509035;1377258;15227

In 1aaa57f5 you get on CIB ~115K IOPs per sd device
and on CX3 you get around 140K IOPs per sd device.

>
> Mlx5_0;sde;3593013;898253;23347 100% CPU kworker/u69:2
> Mlx5_0;sdd;3588555;897138;23376 100% CPU kworker/u69:2
> Mlx4_0;sdc;3525662;881415;23793 100% CPU kworker/u68:0

Is this on the host or the target?

> 4.5.0_rc5_7861728d_00001
> sdc;10.218.128.17;3747591;936897;22384
> sdf;10.218.202.17;3750607;937651;22366
> sdh;10.218.203.17;3750439;937609;22367
> sdn;10.218.204.17;3771008;942752;22245
> sde;10.219.128.17;3867678;966919;21689
> sdg;10.219.202.17;3781889;945472;22181
> sdk;10.219.203.17;3791804;947951;22123
> sdl;10.219.204.17;3795406;948851;22102
> sdd;10.220.128.17;5039110;1259777;16647
> sdi;10.220.202.17;4992921;1248230;16801
> sdj;10.220.203.17;5015610;1253902;16725
> Sdm;10.220.204.17;5087087;1271771;16490

In 7861728d you get on CIB ~95K IOPs per sd device
and on CX3 you get around 125K IOPs per sd device.

I don't see any difference in the code around iser/isert,
in fact, I don't see any commit in drivers/infiniband


>
> Mlx5_0;sde;2930722;732680;28623 ~98% CPU kworker/u69:0
> Mlx5_0;sdd;2910891;727722;28818 ~98% CPU kworker/u69:0
> Mlx4_0;sdc;3263668;815917;25703 ~98% CPU kworker/u68:0

Again, host or target?

> 4.5.0_rc5_f81bf458_00018
> sdb;10.218.128.17;5023720;1255930;16698
> sde;10.218.202.17;5016809;1254202;16721
> sdj;10.218.203.17;5021915;1255478;16704
> sdk;10.218.204.17;5021314;1255328;16706
> sdc;10.219.128.17;4984318;1246079;16830
> sdf;10.219.202.17;4986096;1246524;16824
> sdh;10.219.203.17;5043958;1260989;16631
> sdm;10.219.204.17;5032460;1258115;16669
> sdd;10.220.128.17;3736740;934185;22449
> sdg;10.220.202.17;3728767;932191;22497
> sdi;10.220.203.17;3752117;938029;22357
> Sdl;10.220.204.17;3763901;940975;22287

In f81bf458 you get on CIB ~125K IOPs per sd device
and on CX3 you get around 93K IOPs per sd device which
is the other way around? CIB is better than CX3?

The commits in this gap are:
f81bf458208e iser-target: Separate flows for np listeners and 
connections cma events
aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
b89a7c25462b iser-target: Fix identification of login rx descriptor type

None of those should affect the data-path.

>
> Srpt keeps crashing couldn't test
>
> 4.5.0_rc5_5adabdd1_00023
> Sdc;10.218.128.17;3726448;931612;22511 ~97% CPU kworker/u69:4
> sdf;10.218.202.17;3750271;937567;22368
> sdi;10.218.203.17;3749266;937316;22374
> sdj;10.218.204.17;3798844;949711;22082
> sde;10.219.128.17;3759852;939963;22311 ~97% CPU kworker/u69:4
> sdg;10.219.202.17;3772534;943133;22236
> sdl;10.219.203.17;3769483;942370;22254
> sdn;10.219.204.17;3790604;947651;22130
> sdd;10.220.128.17;5171130;1292782;16222 ~96% CPU kworker/u68:3
> sdh;10.220.202.17;5105354;1276338;16431
> sdk;10.220.203.17;4995300;1248825;16793
> sdm;10.220.204.17;4959564;1239891;16914

In 5adabdd1 you get on CIB ~94K IOPs per sd device
and on CX3 you get around 130K IOPs per sd device
which means you flipped again (very strange).

The commits in this gap are:
5adabdd122e4 iser-target: Split and properly type the login buffer
ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
69c48846f1c7 iser-target: Remove redundant wait in release_conn
6d1fba0c2cc7 iser-target: Rework connection termination

Again, none are suspected to implicate the data-plane.

> Srpt crashes
>
> 4.5.0_rc5_07b63196_00027
> sdb;10.218.128.17;3606142;901535;23262
> sdg;10.218.202.17;3570988;892747;23491
> sdf;10.218.203.17;3576011;894002;23458
> sdk;10.218.204.17;3558113;889528;23576
> sdc;10.219.128.17;3577384;894346;23449
> sde;10.219.202.17;3575401;893850;23462
> sdj;10.219.203.17;3567798;891949;23512
> sdl;10.219.204.17;3584262;896065;23404
> sdd;10.220.128.17;4430680;1107670;18933
> sdh;10.220.202.17;4488286;1122071;18690
> sdi;10.220.203.17;4487326;1121831;18694
> sdm;10.220.204.17;4441236;1110309;18888

In 5adabdd1 you get on CIB ~89K IOPs per sd device
and on CX3 you get around 112K IOPs per sd device

The commits in this gap are:
e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct 
iser_tx_desc
d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
9679cc51eb13 iser-target: Convert to new CQ API

Which do effect the data-path, but nothing that can explain
a specific CIB issue. Moreover, the perf drop happened before that.

> Srpt crashes
>
> 4.5.0_rc5_5e47f198_00036
> sdb;10.218.128.17;3519597;879899;23834
> sdi;10.218.202.17;3512229;878057;23884
> sdh;10.218.203.17;3518563;879640;23841
> sdk;10.218.204.17;3582119;895529;23418
> sdd;10.219.128.17;3550883;887720;23624
> sdj;10.219.202.17;3558415;889603;23574
> sde;10.219.203.17;3552086;888021;23616
> sdl;10.219.204.17;3579521;894880;23435
> sdc;10.220.128.17;4532912;1133228;18506
> sdf;10.220.202.17;4558035;1139508;18404
> sdg;10.220.203.17;4601035;1150258;18232
> sdm;10.220.204.17;4548150;1137037;18444

Same results, and no commit added so makes sense.

> srpt crashes
>
> 4.6.2 vanilla default config
> sde;10.218.128.17;3431063;857765;24449
> sdf;10.218.202.17;3360685;840171;24961
> sdi;10.218.203.17;3355174;838793;25002
> sdm;10.218.204.17;3360955;840238;24959
> sdd;10.219.128.17;3337288;834322;25136
> sdh;10.219.202.17;3327492;831873;25210
> sdj;10.219.203.17;3380867;845216;24812
> sdk;10.219.204.17;3418340;854585;24540
> sdc;10.220.128.17;4668377;1167094;17969
> sdg;10.220.202.17;4716675;1179168;17785
> sdl;10.220.203.17;4675663;1168915;17941
> sdn;10.220.204.17;4631519;1157879;18112
>
> Mlx5_0;sde;3390021;847505;24745 ~98% CPU kworker/u69:3
> Mlx5_0;sdd;3207512;801878;26153 ~98% CPU kworker/u69:3
> Mlx4_0;sdc;2998072;749518;27980 ~98% CPU kworker/u68:0
>
> 4.7.0_rc3_5edb5649
> sdc;10.218.128.17;3260244;815061;25730
> sdg;10.218.202.17;3405988;851497;24629
> sdh;10.218.203.17;3307419;826854;25363
> sdm;10.218.204.17;3430502;857625;24453
> sdi;10.219.128.17;3544282;886070;23668
> sdj;10.219.202.17;3412083;853020;24585
> sdk;10.219.203.17;3422385;855596;24511
> sdl;10.219.204.17;3444164;861041;24356
> sdb;10.220.128.17;4803646;1200911;17463
> sdd;10.220.202.17;4832982;1208245;17357
> sde;10.220.203.17;4809430;1202357;17442
> sdf;10.220.204.17;4808878;1202219;17444


Here there is a new rdma_rw api, which doesn't
make a difference in performance (but no improvement
also).


------------------
So all in all I still don't know what can be the root-cause
here.

You mentioned that you are running fio over a filesystem. Is
it possible to run your tests directly over the block devices? And
can you run the fio with DIRECT-IO?

Also, usually iser, srp and other rdma ULPs are sensitive to
the IRQ assignments of the HCA. An incorrect IRQ affinity assignment
might bring all sorts of noise to performance tests. The normal
practice to get the most out of the HCA is usually to spread the
IRQ assignments linearly on all CPUs
(https://community.mellanox.com/docs/DOC-1483).
Did you perform any steps to spread IRQ interrupts? is irqbalance daemon
on?

It would be good to try and isolate the drop and make sure it
is real and not randomly generated due to some noise in the form of
IRQ assignments.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                         ` <576ABB1B.4020509-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2016-06-22 17:46                                           ` Robert LeBlanc
       [not found]                                             ` <CAANLjFqp8qStMCtcEjsoprfpD1=qnYguKU5+8rL9pkYwHv4PKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-22 17:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

Sagi,

Yes you are understanding the data correctly and what I'm seeing. I
think you are also seeing the confusion that I've been running into
trying to figure this out as well. As far as your questions about SRP,
the performance data is from the initiator and the CPU info is from
the target (all fio threads on the initiator were low CPU
utilization).

I spent a good day tweaking the IRQ assignments (spreading IRQs to all
cores, spreading to all cores on the NUMA node the card is attached
to, and spreading to all non-hyperthreaded cores on the NUMA node).
None of these provided any substantial gains/detriments (irqbalance
was not running). I don't know if there is IRQ steering going on, but
in some cases with irqbalance not running the IRQs would get pinned
back to the previous core(s) and I'd have to set them again. I did not
use the Mellanox scripts, I just did it by hand based on the
documents/scripts. I also offlined all cores on the second NUMA node
which didn't help either. I got more performance gains with nomerges
(1 or 2 provided about the same gain, 2 slightly more) and the queue.
It seems that something in 1aaa57f5 was going right as both cards
performed very well without needing any IRQ fudging.

I understand that there are many moving parts to try and figure this
out, it could be anywhere in the IB drivers, LIO, and even the SCSI
sub systems, RAM disk implementation or file system. However since the
performance is bouncing between cards, it seems it is unlikely
something very common (except when both cards show a loss/gain), but
as you mentioned, there doesn't seem to be any rhyme or reason to the
shifts.

I haven't been using the straight block device in these tests, before
when I did, after one thread read the data, if another read that same
block it then started reading it from cache invalidating the test. I
could only saturate the path/port by highly threaded jobs, I may have
to partition out the disk for block testing. When I ran the tests
using direct I/O the performance was far lower and harder for me to
know when I was reaching the theoretical max of the card/links/PCIe. I
just may have my scripts run the three tests in succession.

Thanks for looking at this. Please let me know what you think would be
most helpful so that I'm making the best use of your and my time.

Thanks,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 22, 2016 at 10:21 AM, Sagi Grimberg <sagi-ImC7XgPzLAfvYQKSrp0J2Q@public.gmane.org> wrote:
> Let me see if I get this correct:
>
>> 4.5.0_rc3_1aaa57f5_00399
>>
>> sdc;10.218.128.17;4627942;1156985;18126
>> sdf;10.218.202.17;4590963;1147740;18272
>> sdk;10.218.203.17;4564980;1141245;18376
>> sdn;10.218.204.17;4571946;1142986;18348
>> sdd;10.219.128.17;4591717;1147929;18269
>> sdi;10.219.202.17;4505644;1126411;18618
>> sdg;10.219.203.17;4562001;1140500;18388
>> sdl;10.219.204.17;4583187;1145796;18303
>> sde;10.220.128.17;5511568;1377892;15220
>> sdh;10.220.202.17;5515555;1378888;15209
>> sdj;10.220.203.17;5609983;1402495;14953
>> sdm;10.220.204.17;5509035;1377258;15227
>
>
> In 1aaa57f5 you get on CIB ~115K IOPs per sd device
> and on CX3 you get around 140K IOPs per sd device.
>
>>
>> Mlx5_0;sde;3593013;898253;23347 100% CPU kworker/u69:2
>> Mlx5_0;sdd;3588555;897138;23376 100% CPU kworker/u69:2
>> Mlx4_0;sdc;3525662;881415;23793 100% CPU kworker/u68:0
>
>
> Is this on the host or the target?
>
>> 4.5.0_rc5_7861728d_00001
>> sdc;10.218.128.17;3747591;936897;22384
>> sdf;10.218.202.17;3750607;937651;22366
>> sdh;10.218.203.17;3750439;937609;22367
>> sdn;10.218.204.17;3771008;942752;22245
>> sde;10.219.128.17;3867678;966919;21689
>> sdg;10.219.202.17;3781889;945472;22181
>> sdk;10.219.203.17;3791804;947951;22123
>> sdl;10.219.204.17;3795406;948851;22102
>> sdd;10.220.128.17;5039110;1259777;16647
>> sdi;10.220.202.17;4992921;1248230;16801
>> sdj;10.220.203.17;5015610;1253902;16725
>> Sdm;10.220.204.17;5087087;1271771;16490
>
>
> In 7861728d you get on CIB ~95K IOPs per sd device
> and on CX3 you get around 125K IOPs per sd device.
>
> I don't see any difference in the code around iser/isert,
> in fact, I don't see any commit in drivers/infiniband
>
>
>>
>> Mlx5_0;sde;2930722;732680;28623 ~98% CPU kworker/u69:0
>> Mlx5_0;sdd;2910891;727722;28818 ~98% CPU kworker/u69:0
>> Mlx4_0;sdc;3263668;815917;25703 ~98% CPU kworker/u68:0
>
>
> Again, host or target?
>
>> 4.5.0_rc5_f81bf458_00018
>> sdb;10.218.128.17;5023720;1255930;16698
>> sde;10.218.202.17;5016809;1254202;16721
>> sdj;10.218.203.17;5021915;1255478;16704
>> sdk;10.218.204.17;5021314;1255328;16706
>> sdc;10.219.128.17;4984318;1246079;16830
>> sdf;10.219.202.17;4986096;1246524;16824
>> sdh;10.219.203.17;5043958;1260989;16631
>> sdm;10.219.204.17;5032460;1258115;16669
>> sdd;10.220.128.17;3736740;934185;22449
>> sdg;10.220.202.17;3728767;932191;22497
>> sdi;10.220.203.17;3752117;938029;22357
>> Sdl;10.220.204.17;3763901;940975;22287
>
>
> In f81bf458 you get on CIB ~125K IOPs per sd device
> and on CX3 you get around 93K IOPs per sd device which
> is the other way around? CIB is better than CX3?
>
> The commits in this gap are:
> f81bf458208e iser-target: Separate flows for np listeners and connections
> cma events
> aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
> b89a7c25462b iser-target: Fix identification of login rx descriptor type
>
> None of those should affect the data-path.
>
>>
>> Srpt keeps crashing couldn't test
>>
>> 4.5.0_rc5_5adabdd1_00023
>> Sdc;10.218.128.17;3726448;931612;22511 ~97% CPU kworker/u69:4
>> sdf;10.218.202.17;3750271;937567;22368
>> sdi;10.218.203.17;3749266;937316;22374
>> sdj;10.218.204.17;3798844;949711;22082
>> sde;10.219.128.17;3759852;939963;22311 ~97% CPU kworker/u69:4
>> sdg;10.219.202.17;3772534;943133;22236
>> sdl;10.219.203.17;3769483;942370;22254
>> sdn;10.219.204.17;3790604;947651;22130
>> sdd;10.220.128.17;5171130;1292782;16222 ~96% CPU kworker/u68:3
>> sdh;10.220.202.17;5105354;1276338;16431
>> sdk;10.220.203.17;4995300;1248825;16793
>> sdm;10.220.204.17;4959564;1239891;16914
>
>
> In 5adabdd1 you get on CIB ~94K IOPs per sd device
> and on CX3 you get around 130K IOPs per sd device
> which means you flipped again (very strange).
>
> The commits in this gap are:
> 5adabdd122e4 iser-target: Split and properly type the login buffer
> ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
> 26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
> 69c48846f1c7 iser-target: Remove redundant wait in release_conn
> 6d1fba0c2cc7 iser-target: Rework connection termination
>
> Again, none are suspected to implicate the data-plane.
>
>> Srpt crashes
>>
>> 4.5.0_rc5_07b63196_00027
>> sdb;10.218.128.17;3606142;901535;23262
>> sdg;10.218.202.17;3570988;892747;23491
>> sdf;10.218.203.17;3576011;894002;23458
>> sdk;10.218.204.17;3558113;889528;23576
>> sdc;10.219.128.17;3577384;894346;23449
>> sde;10.219.202.17;3575401;893850;23462
>> sdj;10.219.203.17;3567798;891949;23512
>> sdl;10.219.204.17;3584262;896065;23404
>> sdd;10.220.128.17;4430680;1107670;18933
>> sdh;10.220.202.17;4488286;1122071;18690
>> sdi;10.220.203.17;4487326;1121831;18694
>> sdm;10.220.204.17;4441236;1110309;18888
>
>
> In 5adabdd1 you get on CIB ~89K IOPs per sd device
> and on CX3 you get around 112K IOPs per sd device
>
> The commits in this gap are:
> e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct
> iser_tx_desc
> d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
> 9679cc51eb13 iser-target: Convert to new CQ API
>
> Which do effect the data-path, but nothing that can explain
> a specific CIB issue. Moreover, the perf drop happened before that.
>
>> Srpt crashes
>>
>> 4.5.0_rc5_5e47f198_00036
>> sdb;10.218.128.17;3519597;879899;23834
>> sdi;10.218.202.17;3512229;878057;23884
>> sdh;10.218.203.17;3518563;879640;23841
>> sdk;10.218.204.17;3582119;895529;23418
>> sdd;10.219.128.17;3550883;887720;23624
>> sdj;10.219.202.17;3558415;889603;23574
>> sde;10.219.203.17;3552086;888021;23616
>> sdl;10.219.204.17;3579521;894880;23435
>> sdc;10.220.128.17;4532912;1133228;18506
>> sdf;10.220.202.17;4558035;1139508;18404
>> sdg;10.220.203.17;4601035;1150258;18232
>> sdm;10.220.204.17;4548150;1137037;18444
>
>
> Same results, and no commit added so makes sense.
>
>
>> srpt crashes
>>
>> 4.6.2 vanilla default config
>> sde;10.218.128.17;3431063;857765;24449
>> sdf;10.218.202.17;3360685;840171;24961
>> sdi;10.218.203.17;3355174;838793;25002
>> sdm;10.218.204.17;3360955;840238;24959
>> sdd;10.219.128.17;3337288;834322;25136
>> sdh;10.219.202.17;3327492;831873;25210
>> sdj;10.219.203.17;3380867;845216;24812
>> sdk;10.219.204.17;3418340;854585;24540
>> sdc;10.220.128.17;4668377;1167094;17969
>> sdg;10.220.202.17;4716675;1179168;17785
>> sdl;10.220.203.17;4675663;1168915;17941
>> sdn;10.220.204.17;4631519;1157879;18112
>>
>> Mlx5_0;sde;3390021;847505;24745 ~98% CPU kworker/u69:3
>> Mlx5_0;sdd;3207512;801878;26153 ~98% CPU kworker/u69:3
>> Mlx4_0;sdc;2998072;749518;27980 ~98% CPU kworker/u68:0
>>
>> 4.7.0_rc3_5edb5649
>> sdc;10.218.128.17;3260244;815061;25730
>> sdg;10.218.202.17;3405988;851497;24629
>> sdh;10.218.203.17;3307419;826854;25363
>> sdm;10.218.204.17;3430502;857625;24453
>> sdi;10.219.128.17;3544282;886070;23668
>> sdj;10.219.202.17;3412083;853020;24585
>> sdk;10.219.203.17;3422385;855596;24511
>> sdl;10.219.204.17;3444164;861041;24356
>> sdb;10.220.128.17;4803646;1200911;17463
>> sdd;10.220.202.17;4832982;1208245;17357
>> sde;10.220.203.17;4809430;1202357;17442
>> sdf;10.220.204.17;4808878;1202219;17444
>
>
>
> Here there is a new rdma_rw api, which doesn't
> make a difference in performance (but no improvement
> also).
>
>
> ------------------
> So all in all I still don't know what can be the root-cause
> here.
>
> You mentioned that you are running fio over a filesystem. Is
> it possible to run your tests directly over the block devices? And
> can you run the fio with DIRECT-IO?
>
> Also, usually iser, srp and other rdma ULPs are sensitive to
> the IRQ assignments of the HCA. An incorrect IRQ affinity assignment
> might bring all sorts of noise to performance tests. The normal
> practice to get the most out of the HCA is usually to spread the
> IRQ assignments linearly on all CPUs
> (https://community.mellanox.com/docs/DOC-1483).
> Did you perform any steps to spread IRQ interrupts? is irqbalance daemon
> on?
>
> It would be good to try and isolate the drop and make sure it
> is real and not randomly generated due to some noise in the form of
> IRQ assignments.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Connect-IB not performing as well as ConnectX-3 with iSER
       [not found]                                             ` <CAANLjFqp8qStMCtcEjsoprfpD1=qnYguKU5+8rL9pkYwHv4PKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-06-24 18:34                                               ` Robert LeBlanc
  0 siblings, 0 replies; 20+ messages in thread
From: Robert LeBlanc @ 2016-06-24 18:34 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Max Gurtovoy

Sagi,

Here is an example of the different types of tests. This was only on one kernel.

The first two are to set a baseline. The lines starting with buffer is
using fio with direct=0, the lines starting with direct is fio with
direct=1. The lines starting with block is fio running against a raw
block deice (technically 40 partitions on a single drive) with
direct=0. I also reduced the tests to only test one path per port
instead of four like before.

# /root/run_path_tests.sh check-paths
#### Test all iSER paths individually ####
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3815778;953944;21984
buffer;sdd;10.219.128.17;3743744;935936;22407
buffer;sde;10.220.128.17;4915392;1228848;17066
direct;sdc;10.218.128.17;876644;219161;95690
direct;sdd;10.219.128.17;881684;220421;95143
direct;sde;10.220.128.17;892215;223053;94020
block;sdc;10.218.128.17;3890459;972614;21562
block;sdd;10.219.128.17;4127642;1031910;20323
block;sde;10.220.128.17;4939705;1234926;16982
# /root/run_path_tests.sh check-paths
#### Test all iSER paths individually ####
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3983572;995893;21058
buffer;sdd;10.219.128.17;3774231;943557;22226
buffer;sde;10.220.128.17;4856204;1214051;17274
direct;sdc;10.218.128.17;875820;218955;95780
direct;sdd;10.219.128.17;884072;221018;94886
direct;sde;10.220.128.17;902486;225621;92950
block;sdc;10.218.128.17;3790433;947608;22131
block;sdd;10.219.128.17;3860025;965006;21732
block;sde;10.220.128.17;4946404;1236601;16959

For the following test, I set the IRQ on the initiator using mlx_tune
-p HIGH_THROUGHPUT with irqbalance disabled.

# /root/run_path_tests.sh check-paths
#### Test all iSER paths individually ####
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3742742;935685;22413
buffer;sdd;10.219.128.17;3786327;946581;22155
buffer;sde;10.220.128.17;5009619;1252404;16745
direct;sdc;10.218.128.17;871942;217985;96206
direct;sdd;10.219.128.17;883467;220866;94951
direct;sde;10.220.128.17;901138;225284;93089
block;sdc;10.218.128.17;3911319;977829;21447
block;sdd;10.219.128.17;3758168;939542;22321
block;sde;10.220.128.17;4968377;1242094;16884

For the following test, I also set the IRQs on the target using
mlx_tune -p HIGH_THROUGHPUT and disabled irqbalance.

# /root/run_path_tests.sh check-paths
#### Test all iSER paths individually ####
4.5.0-rc5-5adabdd1-00023-g5adabdd
buffer;sdc;10.218.128.17;3804357;951089;22050
buffer;sdd;10.219.128.17;3767113;941778;22268
buffer;sde;10.220.128.17;4966612;1241653;16890
direct;sdc;10.218.128.17;879742;219935;95353
direct;sdd;10.219.128.17;886641;221660;94611
direct;sde;10.220.128.17;886857;221714;94588
block;sdc;10.218.128.17;3760864;940216;22305
block;sdd;10.219.128.17;3763564;940891;22289
block;sde;10.220.128.17;4965436;1241359;16894

It seems that mlx_tune marginally helps, but not really providing
anything groundbreaking.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 22, 2016 at 11:46 AM, Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org> wrote:
> Sagi,
>
> Yes you are understanding the data correctly and what I'm seeing. I
> think you are also seeing the confusion that I've been running into
> trying to figure this out as well. As far as your questions about SRP,
> the performance data is from the initiator and the CPU info is from
> the target (all fio threads on the initiator were low CPU
> utilization).
>
> I spent a good day tweaking the IRQ assignments (spreading IRQs to all
> cores, spreading to all cores on the NUMA node the card is attached
> to, and spreading to all non-hyperthreaded cores on the NUMA node).
> None of these provided any substantial gains/detriments (irqbalance
> was not running). I don't know if there is IRQ steering going on, but
> in some cases with irqbalance not running the IRQs would get pinned
> back to the previous core(s) and I'd have to set them again. I did not
> use the Mellanox scripts, I just did it by hand based on the
> documents/scripts. I also offlined all cores on the second NUMA node
> which didn't help either. I got more performance gains with nomerges
> (1 or 2 provided about the same gain, 2 slightly more) and the queue.
> It seems that something in 1aaa57f5 was going right as both cards
> performed very well without needing any IRQ fudging.
>
> I understand that there are many moving parts to try and figure this
> out, it could be anywhere in the IB drivers, LIO, and even the SCSI
> sub systems, RAM disk implementation or file system. However since the
> performance is bouncing between cards, it seems it is unlikely
> something very common (except when both cards show a loss/gain), but
> as you mentioned, there doesn't seem to be any rhyme or reason to the
> shifts.
>
> I haven't been using the straight block device in these tests, before
> when I did, after one thread read the data, if another read that same
> block it then started reading it from cache invalidating the test. I
> could only saturate the path/port by highly threaded jobs, I may have
> to partition out the disk for block testing. When I ran the tests
> using direct I/O the performance was far lower and harder for me to
> know when I was reaching the theoretical max of the card/links/PCIe. I
> just may have my scripts run the three tests in succession.
>
> Thanks for looking at this. Please let me know what you think would be
> most helpful so that I'm making the best use of your and my time.
>
> Thanks,
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Jun 22, 2016 at 10:21 AM, Sagi Grimberg <sagi-ImC7XgPzLAfvYQKSrp0J2Q@public.gmane.org> wrote:
>> Let me see if I get this correct:
>>
>>> 4.5.0_rc3_1aaa57f5_00399
>>>
>>> sdc;10.218.128.17;4627942;1156985;18126
>>> sdf;10.218.202.17;4590963;1147740;18272
>>> sdk;10.218.203.17;4564980;1141245;18376
>>> sdn;10.218.204.17;4571946;1142986;18348
>>> sdd;10.219.128.17;4591717;1147929;18269
>>> sdi;10.219.202.17;4505644;1126411;18618
>>> sdg;10.219.203.17;4562001;1140500;18388
>>> sdl;10.219.204.17;4583187;1145796;18303
>>> sde;10.220.128.17;5511568;1377892;15220
>>> sdh;10.220.202.17;5515555;1378888;15209
>>> sdj;10.220.203.17;5609983;1402495;14953
>>> sdm;10.220.204.17;5509035;1377258;15227
>>
>>
>> In 1aaa57f5 you get on CIB ~115K IOPs per sd device
>> and on CX3 you get around 140K IOPs per sd device.
>>
>>>
>>> Mlx5_0;sde;3593013;898253;23347 100% CPU kworker/u69:2
>>> Mlx5_0;sdd;3588555;897138;23376 100% CPU kworker/u69:2
>>> Mlx4_0;sdc;3525662;881415;23793 100% CPU kworker/u68:0
>>
>>
>> Is this on the host or the target?
>>
>>> 4.5.0_rc5_7861728d_00001
>>> sdc;10.218.128.17;3747591;936897;22384
>>> sdf;10.218.202.17;3750607;937651;22366
>>> sdh;10.218.203.17;3750439;937609;22367
>>> sdn;10.218.204.17;3771008;942752;22245
>>> sde;10.219.128.17;3867678;966919;21689
>>> sdg;10.219.202.17;3781889;945472;22181
>>> sdk;10.219.203.17;3791804;947951;22123
>>> sdl;10.219.204.17;3795406;948851;22102
>>> sdd;10.220.128.17;5039110;1259777;16647
>>> sdi;10.220.202.17;4992921;1248230;16801
>>> sdj;10.220.203.17;5015610;1253902;16725
>>> Sdm;10.220.204.17;5087087;1271771;16490
>>
>>
>> In 7861728d you get on CIB ~95K IOPs per sd device
>> and on CX3 you get around 125K IOPs per sd device.
>>
>> I don't see any difference in the code around iser/isert,
>> in fact, I don't see any commit in drivers/infiniband
>>
>>
>>>
>>> Mlx5_0;sde;2930722;732680;28623 ~98% CPU kworker/u69:0
>>> Mlx5_0;sdd;2910891;727722;28818 ~98% CPU kworker/u69:0
>>> Mlx4_0;sdc;3263668;815917;25703 ~98% CPU kworker/u68:0
>>
>>
>> Again, host or target?
>>
>>> 4.5.0_rc5_f81bf458_00018
>>> sdb;10.218.128.17;5023720;1255930;16698
>>> sde;10.218.202.17;5016809;1254202;16721
>>> sdj;10.218.203.17;5021915;1255478;16704
>>> sdk;10.218.204.17;5021314;1255328;16706
>>> sdc;10.219.128.17;4984318;1246079;16830
>>> sdf;10.219.202.17;4986096;1246524;16824
>>> sdh;10.219.203.17;5043958;1260989;16631
>>> sdm;10.219.204.17;5032460;1258115;16669
>>> sdd;10.220.128.17;3736740;934185;22449
>>> sdg;10.220.202.17;3728767;932191;22497
>>> sdi;10.220.203.17;3752117;938029;22357
>>> Sdl;10.220.204.17;3763901;940975;22287
>>
>>
>> In f81bf458 you get on CIB ~125K IOPs per sd device
>> and on CX3 you get around 93K IOPs per sd device which
>> is the other way around? CIB is better than CX3?
>>
>> The commits in this gap are:
>> f81bf458208e iser-target: Separate flows for np listeners and connections
>> cma events
>> aea92980601f iser-target: Add new state ISER_CONN_BOUND to isert_conn
>> b89a7c25462b iser-target: Fix identification of login rx descriptor type
>>
>> None of those should affect the data-path.
>>
>>>
>>> Srpt keeps crashing couldn't test
>>>
>>> 4.5.0_rc5_5adabdd1_00023
>>> Sdc;10.218.128.17;3726448;931612;22511 ~97% CPU kworker/u69:4
>>> sdf;10.218.202.17;3750271;937567;22368
>>> sdi;10.218.203.17;3749266;937316;22374
>>> sdj;10.218.204.17;3798844;949711;22082
>>> sde;10.219.128.17;3759852;939963;22311 ~97% CPU kworker/u69:4
>>> sdg;10.219.202.17;3772534;943133;22236
>>> sdl;10.219.203.17;3769483;942370;22254
>>> sdn;10.219.204.17;3790604;947651;22130
>>> sdd;10.220.128.17;5171130;1292782;16222 ~96% CPU kworker/u68:3
>>> sdh;10.220.202.17;5105354;1276338;16431
>>> sdk;10.220.203.17;4995300;1248825;16793
>>> sdm;10.220.204.17;4959564;1239891;16914
>>
>>
>> In 5adabdd1 you get on CIB ~94K IOPs per sd device
>> and on CX3 you get around 130K IOPs per sd device
>> which means you flipped again (very strange).
>>
>> The commits in this gap are:
>> 5adabdd122e4 iser-target: Split and properly type the login buffer
>> ed1083b251f0 iser-target: Remove ISER_RECV_DATA_SEG_LEN
>> 26c7b673db57 iser-target: Remove impossible condition from isert_wait_conn
>> 69c48846f1c7 iser-target: Remove redundant wait in release_conn
>> 6d1fba0c2cc7 iser-target: Rework connection termination
>>
>> Again, none are suspected to implicate the data-plane.
>>
>>> Srpt crashes
>>>
>>> 4.5.0_rc5_07b63196_00027
>>> sdb;10.218.128.17;3606142;901535;23262
>>> sdg;10.218.202.17;3570988;892747;23491
>>> sdf;10.218.203.17;3576011;894002;23458
>>> sdk;10.218.204.17;3558113;889528;23576
>>> sdc;10.219.128.17;3577384;894346;23449
>>> sde;10.219.202.17;3575401;893850;23462
>>> sdj;10.219.203.17;3567798;891949;23512
>>> sdl;10.219.204.17;3584262;896065;23404
>>> sdd;10.220.128.17;4430680;1107670;18933
>>> sdh;10.220.202.17;4488286;1122071;18690
>>> sdi;10.220.203.17;4487326;1121831;18694
>>> sdm;10.220.204.17;4441236;1110309;18888
>>
>>
>> In 5adabdd1 you get on CIB ~89K IOPs per sd device
>> and on CX3 you get around 112K IOPs per sd device
>>
>> The commits in this gap are:
>> e3416ab2d156 iser-target: Kill the ->isert_cmd back pointer in struct
>> iser_tx_desc
>> d1ca2ed7dcf8 iser-target: Kill struct isert_rdma_wr
>> 9679cc51eb13 iser-target: Convert to new CQ API
>>
>> Which do effect the data-path, but nothing that can explain
>> a specific CIB issue. Moreover, the perf drop happened before that.
>>
>>> Srpt crashes
>>>
>>> 4.5.0_rc5_5e47f198_00036
>>> sdb;10.218.128.17;3519597;879899;23834
>>> sdi;10.218.202.17;3512229;878057;23884
>>> sdh;10.218.203.17;3518563;879640;23841
>>> sdk;10.218.204.17;3582119;895529;23418
>>> sdd;10.219.128.17;3550883;887720;23624
>>> sdj;10.219.202.17;3558415;889603;23574
>>> sde;10.219.203.17;3552086;888021;23616
>>> sdl;10.219.204.17;3579521;894880;23435
>>> sdc;10.220.128.17;4532912;1133228;18506
>>> sdf;10.220.202.17;4558035;1139508;18404
>>> sdg;10.220.203.17;4601035;1150258;18232
>>> sdm;10.220.204.17;4548150;1137037;18444
>>
>>
>> Same results, and no commit added so makes sense.
>>
>>
>>> srpt crashes
>>>
>>> 4.6.2 vanilla default config
>>> sde;10.218.128.17;3431063;857765;24449
>>> sdf;10.218.202.17;3360685;840171;24961
>>> sdi;10.218.203.17;3355174;838793;25002
>>> sdm;10.218.204.17;3360955;840238;24959
>>> sdd;10.219.128.17;3337288;834322;25136
>>> sdh;10.219.202.17;3327492;831873;25210
>>> sdj;10.219.203.17;3380867;845216;24812
>>> sdk;10.219.204.17;3418340;854585;24540
>>> sdc;10.220.128.17;4668377;1167094;17969
>>> sdg;10.220.202.17;4716675;1179168;17785
>>> sdl;10.220.203.17;4675663;1168915;17941
>>> sdn;10.220.204.17;4631519;1157879;18112
>>>
>>> Mlx5_0;sde;3390021;847505;24745 ~98% CPU kworker/u69:3
>>> Mlx5_0;sdd;3207512;801878;26153 ~98% CPU kworker/u69:3
>>> Mlx4_0;sdc;2998072;749518;27980 ~98% CPU kworker/u68:0
>>>
>>> 4.7.0_rc3_5edb5649
>>> sdc;10.218.128.17;3260244;815061;25730
>>> sdg;10.218.202.17;3405988;851497;24629
>>> sdh;10.218.203.17;3307419;826854;25363
>>> sdm;10.218.204.17;3430502;857625;24453
>>> sdi;10.219.128.17;3544282;886070;23668
>>> sdj;10.219.202.17;3412083;853020;24585
>>> sdk;10.219.203.17;3422385;855596;24511
>>> sdl;10.219.204.17;3444164;861041;24356
>>> sdb;10.220.128.17;4803646;1200911;17463
>>> sdd;10.220.202.17;4832982;1208245;17357
>>> sde;10.220.203.17;4809430;1202357;17442
>>> sdf;10.220.204.17;4808878;1202219;17444
>>
>>
>>
>> Here there is a new rdma_rw api, which doesn't
>> make a difference in performance (but no improvement
>> also).
>>
>>
>> ------------------
>> So all in all I still don't know what can be the root-cause
>> here.
>>
>> You mentioned that you are running fio over a filesystem. Is
>> it possible to run your tests directly over the block devices? And
>> can you run the fio with DIRECT-IO?
>>
>> Also, usually iser, srp and other rdma ULPs are sensitive to
>> the IRQ assignments of the HCA. An incorrect IRQ affinity assignment
>> might bring all sorts of noise to performance tests. The normal
>> practice to get the most out of the HCA is usually to spread the
>> IRQ assignments linearly on all CPUs
>> (https://community.mellanox.com/docs/DOC-1483).
>> Did you perform any steps to spread IRQ interrupts? is irqbalance daemon
>> on?
>>
>> It would be good to try and isolate the drop and make sure it
>> is real and not randomly generated due to some noise in the form of
>> IRQ assignments.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-06-24 18:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06 22:36 Connect-IB not performing as well as ConnectX-3 with iSER Robert LeBlanc
     [not found] ` <CAANLjFoL5zow4f4RXP5t8LM7wsWN1OQ-hD2mtPUBTLkJ7UZ5kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-07 12:02   ` Max Gurtovoy
     [not found]     ` <5756B7D2.5040009-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-06-07 16:48       ` Robert LeBlanc
     [not found]         ` <CAANLjFq4CoOSbng=aPHiSsFB=1HMSwAhhLiCjt+88dzz24OT9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-07 22:37           ` Robert LeBlanc
     [not found]             ` <CAANLjFoLJNQWtHHqjHmhc0iBq14NAV_GgkbyQabjzyeN56t+Ow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-08 13:52               ` Max Gurtovoy
     [not found]                 ` <57582336.10407-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-06-08 15:33                   ` Robert LeBlanc
2016-06-10 21:36                     ` Robert LeBlanc
     [not found]                       ` <CAANLjFrv-0VArTEkgqbrhzFjn1fg_egpCJuQZnAurVrHjbL_qA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-20 15:23                         ` Robert LeBlanc
     [not found]                           ` <CAANLjFqoV-5HK0c+LdEbuxd81Vm=g=WE3cQgp47dH-yfYjZjGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-20 21:27                             ` Max Gurtovoy
     [not found]                               ` <3646a0c9-3f2d-66b8-c4da-c91ca1d01cee-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-06-20 21:52                                 ` Robert LeBlanc
2016-06-21 13:08                             ` Sagi Grimberg
     [not found]                               ` <57693C6A.3020805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-06-21 14:50                                 ` Robert LeBlanc
     [not found]                                   ` <CAANLjFpUyAYB+ZzMwFKBpa4yLmALPzcRGJX1kExVrLARZmZRkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-21 20:26                                     ` Robert LeBlanc
     [not found]                                       ` <CAANLjFpeL0AkuGW-q5Bmm-dff0UqFOM_sAOaG7=vyqmwnOoTcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-22  8:18                                         ` Bart Van Assche
     [not found]                                           ` <86d4404a-fa6a-72de-8e83-827072c308b5-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-22 12:23                                             ` Laurence Oberman
2016-06-22 15:45                                             ` Robert LeBlanc
2016-06-22  9:52                                         ` Sagi Grimberg
2016-06-22 16:21                                       ` Sagi Grimberg
     [not found]                                         ` <576ABB1B.4020509-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-06-22 17:46                                           ` Robert LeBlanc
     [not found]                                             ` <CAANLjFqp8qStMCtcEjsoprfpD1=qnYguKU5+8rL9pkYwHv4PKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-06-24 18:34                                               ` Robert LeBlanc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.