linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: regression: nvme rdma with bnxt_re0 broken
       [not found] ` <619411460.27128070.1562838433020.JavaMail.zimbra@redhat.com>
@ 2019-07-11 16:13   ` Sagi Grimberg
       [not found]   ` <AM0PR05MB48664657022ECA8526E3C967D1F30@AM0PR05MB4866.eurprd05.prod.outlook.com>
  1 sibling, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2019-07-11 16:13 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme; +Cc: danielj, parav, linux-rdma

CC linux-rdma

On 7/11/19 2:47 AM, Yi Zhang wrote:
> Hello
> 
> 'nvme connect' failed when use bnxt_re0 on latest upstream build[1], by bisecting I found it was introduced from v5.2.0-rc1 with [2], it works after I revert it.
> Let me know if you need more info, thanks.
> 
> [1]
> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n testnqn
> Failed to write to /dev/nvme-fabrics: Bad address
> 
> [root@rdma-perf-07 ~]$ dmesg
> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15 status 0x5
> [  476.327103] infiniband bnxt_re0: Failed to allocate HW AH
> [  476.332525] nvme nvme2: rdma_connect failed (-14).
> [  476.343552] nvme nvme2: rdma connection establishment failed (-14)
> 
> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> 
> 
> [2]
> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> Author: Parav Pandit <parav@mellanox.com>
> Date:   Wed Apr 10 11:23:03 2019 +0300
> 
>      IB/core: Allow vlan link local address based RoCE GIDs
>      
>      IPv6 link local address for a VLAN netdevice has nothing to do with its
>      resemblance with the default GID, because VLAN link local GID is in
>      different layer 2 domain.
>      
>      Now that RoCE MAD packet processing and route resolution consider the
>      right GID index, there is no need for an unnecessary check which prevents
>      the addition of vlan based IPv6 link local GIDs.
>      
>      Signed-off-by: Parav Pandit <parav@mellanox.com>
>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> 
> 
> Best Regards,
>    Yi Zhang
> 
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
       [not found]   ` <AM0PR05MB48664657022ECA8526E3C967D1F30@AM0PR05MB4866.eurprd05.prod.outlook.com>
@ 2019-07-11 16:18     ` Parav Pandit
  2019-07-12  1:53       ` Yi Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Parav Pandit @ 2019-07-11 16:18 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, selvin.xavier

Sagi,

This is better one to cc to linux-rdma.

+ Devesh, Selvin.

> -----Original Message-----
> From: Parav Pandit
> Sent: Thursday, July 11, 2019 6:25 PM
> To: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> 
> Hi Yi Zhang,
> 
> > -----Original Message-----
> > From: Yi Zhang <yi.zhang@redhat.com>
> > Sent: Thursday, July 11, 2019 3:17 PM
> > To: linux-nvme@lists.infradead.org
> > Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
> > <parav@mellanox.com>
> > Subject: regression: nvme rdma with bnxt_re0 broken
> >
> > Hello
> >
> > 'nvme connect' failed when use bnxt_re0 on latest upstream build[1],
> > by bisecting I found it was introduced from v5.2.0-rc1 with [2], it
> > works after I revert it.
> > Let me know if you need more info, thanks.
> >
> > [1]
> > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420
> > -n testnqn Failed to write to /dev/nvme-fabrics: Bad address
> >
> > [root@rdma-perf-07 ~]$ dmesg
> > [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15 status
> > 0x5 [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> > 476.332525] nvme nvme2: rdma_connect failed (-14).
> > [  476.343552] nvme nvme2: rdma connection establishment failed (-14)
> >
> > [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> > 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > BCM5720 2-port Gigabit Ethernet PCIe
> > 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > BCM5720 2-port Gigabit Ethernet PCIe
> > 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury]
> > (rev
> > 02)
> > 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> > NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> > NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> >
> >
> > [2]
> > commit 823b23da71132b80d9f41ab667c68b112455f3b6
> > Author: Parav Pandit <parav@mellanox.com>
> > Date:   Wed Apr 10 11:23:03 2019 +0300
> >
> >     IB/core: Allow vlan link local address based RoCE GIDs
> >
> >     IPv6 link local address for a VLAN netdevice has nothing to do with its
> >     resemblance with the default GID, because VLAN link local GID is in
> >     different layer 2 domain.
> >
> >     Now that RoCE MAD packet processing and route resolution consider the
> >     right GID index, there is no need for an unnecessary check which prevents
> >     the addition of vlan based IPv6 link local GIDs.
> >
> >     Signed-off-by: Parav Pandit <parav@mellanox.com>
> >     Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> >     Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >     Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >
> >
> >
> > Best Regards,
> >   Yi Zhang
> >
> 
> I need some more information from you to debug this issue as I don’t have the
> hw.
> The highlighted patch added support for IPv6 link local address for vlan. I am
> unsure how this can affect IPv4 AH creation for which there is failure.
> 
> 1. Before you assign the IP address to the netdevice, Please do, echo -n
> "module ib_core +p" > /sys/kernel/debug/dynamic_debug/control
> 
> Please share below output before doing nvme connect.
> 2. Output of script [1]
> $ show_gids script
> If getting this script is problematic, share the output of,
> 
> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> $ ip link show
> $ip addr show
> $ dmesg
> 
> [1] https://community.mellanox.com/s/article/understanding-show-gids-
> script#jive_content_id_The_Script
> 
> I suspect that driver's assumption about GID indices might have gone wrong
> here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> Lets see about results to confirm that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-11 16:18     ` Parav Pandit
@ 2019-07-12  1:53       ` Yi Zhang
  2019-07-12  2:49         ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: Yi Zhang @ 2019-07-12  1:53 UTC (permalink / raw)
  To: Parav Pandit, linux-nvme
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, selvin.xavier

Hi Parav

Here is the info, let me know if it's enough, thanks.

[root@rdma-perf-07 ~]$ echo -n "module ib_core +p" > 
/sys/kernel/debug/dynamic_debug/control
[root@rdma-perf-07 ~]$ ifdown bnxt_roce
Device 'bnxt_roce' successfully disconnected.
[root@rdma-perf-07 ~]$ ifup bnxt_roce
Connection successfully activated (D-Bus active path: 
/org/freedesktop/NetworkManager/ActiveConnection/16)
[root@rdma-perf-07 ~]$ sh a.sh
DEV    PORT    INDEX    GID                    IPv4         VER DEV
---    ----    -----    ---                    ------------ ---    ---
bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v1    bnxt_roce
bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v2    bnxt_roce
bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb 
172.31.43.187     v1    bnxt_roce.43
bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb 
172.31.43.187     v2    bnxt_roce.43
bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v1    bnxt_roce.45
bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v2    bnxt_roce.45
bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v1    bnxt_roce.43
bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32         
v2    bnxt_roce.43
bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb 
172.31.40.187     v1    bnxt_roce
bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb 
172.31.40.187     v2    bnxt_roce
bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb 
172.31.45.187     v1    bnxt_roce.45
bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb 
172.31.45.187     v2    bnxt_roce.45
bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33         
v1    lom_2
bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33         
v2    lom_2
cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
n_gids_found=19

[root@rdma-perf-07 ~]$ dmesg | tail -15
[   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link 
becomes ready
[   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8004: link 
becomes ready
[   20.010469] hfi1 0000:d8:00.0: hfi1_0: Switching to NO_DMA_RTAIL
[   20.440580] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link 
becomes ready
[   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes 
requested: 8. Max supported is 2.
[   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes 
requested: 8. Max supported is 2.
[   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link becomes ready
[  211.407329] _ib_cache_gid_del: can't delete gid 
fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22
[  211.407334] _ib_cache_gid_del: can't delete gid 
fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22
[  211.425275] infiniband bnxt_re0: del_gid port=1 index=6 gid 
0000:0000:0000:0000:0000:ffff:ac1f:28bb
[  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1 index=6 
gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
[  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid 
0000:0000:0000:0000:0000:ffff:ac1f:28bb
[  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1 index=7 
gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
[  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6 gid 
0000:0000:0000:0000:0000:ffff:ac1f:28bb
[  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7 gid 
0000:0000:0000:0000:0000:ffff:ac1f:28bb


On 7/12/19 12:18 AM, Parav Pandit wrote:
> Sagi,
>
> This is better one to cc to linux-rdma.
>
> + Devesh, Selvin.
>
>> -----Original Message-----
>> From: Parav Pandit
>> Sent: Thursday, July 11, 2019 6:25 PM
>> To: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org
>> Cc: Daniel Jurgens <danielj@mellanox.com>
>> Subject: RE: regression: nvme rdma with bnxt_re0 broken
>>
>> Hi Yi Zhang,
>>
>>> -----Original Message-----
>>> From: Yi Zhang <yi.zhang@redhat.com>
>>> Sent: Thursday, July 11, 2019 3:17 PM
>>> To: linux-nvme@lists.infradead.org
>>> Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
>>> <parav@mellanox.com>
>>> Subject: regression: nvme rdma with bnxt_re0 broken
>>>
>>> Hello
>>>
>>> 'nvme connect' failed when use bnxt_re0 on latest upstream build[1],
>>> by bisecting I found it was introduced from v5.2.0-rc1 with [2], it
>>> works after I revert it.
>>> Let me know if you need more info, thanks.
>>>
>>> [1]
>>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420
>>> -n testnqn Failed to write to /dev/nvme-fabrics: Bad address
>>>
>>> [root@rdma-perf-07 ~]$ dmesg
>>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15 status
>>> 0x5 [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
>>> 476.332525] nvme nvme2: rdma_connect failed (-14).
>>> [  476.343552] nvme nvme2: rdma connection establishment failed (-14)
>>>
>>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
>>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>>> BCM5720 2-port Gigabit Ethernet PCIe
>>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
>>> BCM5720 2-port Gigabit Ethernet PCIe
>>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury]
>>> (rev
>>> 02)
>>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
>>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
>>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
>>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
>>>
>>>
>>> [2]
>>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
>>> Author: Parav Pandit <parav@mellanox.com>
>>> Date:   Wed Apr 10 11:23:03 2019 +0300
>>>
>>>      IB/core: Allow vlan link local address based RoCE GIDs
>>>
>>>      IPv6 link local address for a VLAN netdevice has nothing to do with its
>>>      resemblance with the default GID, because VLAN link local GID is in
>>>      different layer 2 domain.
>>>
>>>      Now that RoCE MAD packet processing and route resolution consider the
>>>      right GID index, there is no need for an unnecessary check which prevents
>>>      the addition of vlan based IPv6 link local GIDs.
>>>
>>>      Signed-off-by: Parav Pandit <parav@mellanox.com>
>>>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
>>>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>>>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>>>
>>>
>>>
>>> Best Regards,
>>>    Yi Zhang
>>>
>> I need some more information from you to debug this issue as I don’t have the
>> hw.
>> The highlighted patch added support for IPv6 link local address for vlan. I am
>> unsure how this can affect IPv4 AH creation for which there is failure.
>>
>> 1. Before you assign the IP address to the netdevice, Please do, echo -n
>> "module ib_core +p" > /sys/kernel/debug/dynamic_debug/control
>>
>> Please share below output before doing nvme connect.
>> 2. Output of script [1]
>> $ show_gids script
>> If getting this script is problematic, share the output of,
>>
>> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
>> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
>> $ ip link show
>> $ip addr show
>> $ dmesg
>>
>> [1] https://community.mellanox.com/s/article/understanding-show-gids-
>> script#jive_content_id_The_Script
>>
>> I suspect that driver's assumption about GID indices might have gone wrong
>> here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
>> Lets see about results to confirm that.
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  1:53       ` Yi Zhang
@ 2019-07-12  2:49         ` Parav Pandit
  2019-07-12  3:45           ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Parav Pandit @ 2019-07-12  2:49 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, selvin.xavier

Hi Yi Zhang,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Yi Zhang
> Sent: Friday, July 12, 2019 7:23 AM
> To: Parav Pandit <parav@mellanox.com>; linux-nvme@lists.infradead.org
> Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> Devesh Sharma <devesh.sharma@broadcom.com>;
> selvin.xavier@broadcom.com
> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> 
> Hi Parav
> 
> Here is the info, let me know if it's enough, thanks.
> 
> [root@rdma-perf-07 ~]$ echo -n "module ib_core +p" >
> /sys/kernel/debug/dynamic_debug/control
> [root@rdma-perf-07 ~]$ ifdown bnxt_roce
> Device 'bnxt_roce' successfully disconnected.
> [root@rdma-perf-07 ~]$ ifup bnxt_roce
> Connection successfully activated (D-Bus active path:
> /org/freedesktop/NetworkManager/ActiveConnection/16)
> [root@rdma-perf-07 ~]$ sh a.sh
> DEV    PORT    INDEX    GID                    IPv4         VER DEV
> ---    ----    -----    ---                    ------------ ---    ---
> bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v1    bnxt_roce
> bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v2    bnxt_roce
> bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> 172.31.43.187     v1    bnxt_roce.43
> bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> 172.31.43.187     v2    bnxt_roce.43
> bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v1    bnxt_roce.45
> bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v2    bnxt_roce.45
> bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v1    bnxt_roce.43
> bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> v2    bnxt_roce.43
> bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> 172.31.40.187     v1    bnxt_roce
> bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> 172.31.40.187     v2    bnxt_roce
> bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> 172.31.45.187     v1    bnxt_roce.45
> bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> 172.31.45.187     v2    bnxt_roce.45
> bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> v1    lom_2
> bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> v2    lom_2
> cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
> cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
> hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
> hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
> mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
> n_gids_found=19
> 
> [root@rdma-perf-07 ~]$ dmesg | tail -15
> [   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link
> becomes ready [   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE):
> mlx5_ib0.8004: link becomes ready [   20.010469] hfi1 0000:d8:00.0: hfi1_0:
> Switching to NO_DMA_RTAIL [   20.440580] IPv6:
> ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
> [   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> requested: 8. Max supported is 2.
> [   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> requested: 8. Max supported is 2.
> [   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link becomes
> ready [  211.407329] _ib_cache_gid_del: can't delete gid
> fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.407334]
> _ib_cache_gid_del: can't delete gid
> fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.425275] infiniband
> bnxt_re0: del_gid port=1 index=6 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> [  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1 index=6 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> [  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> [  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1 index=7 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> [  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> [  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7 gid
> 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> 
> 
GID table looks fine.

> On 7/12/19 12:18 AM, Parav Pandit wrote:
> > Sagi,
> >
> > This is better one to cc to linux-rdma.
> >
> > + Devesh, Selvin.
> >
> >> -----Original Message-----
> >> From: Parav Pandit
> >> Sent: Thursday, July 11, 2019 6:25 PM
> >> To: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org
> >> Cc: Daniel Jurgens <danielj@mellanox.com>
> >> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> >>
> >> Hi Yi Zhang,
> >>
> >>> -----Original Message-----
> >>> From: Yi Zhang <yi.zhang@redhat.com>
> >>> Sent: Thursday, July 11, 2019 3:17 PM
> >>> To: linux-nvme@lists.infradead.org
> >>> Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
> >>> <parav@mellanox.com>
> >>> Subject: regression: nvme rdma with bnxt_re0 broken
> >>>
> >>> Hello
> >>>
> >>> 'nvme connect' failed when use bnxt_re0 on latest upstream build[1],
> >>> by bisecting I found it was introduced from v5.2.0-rc1 with [2], it
> >>> works after I revert it.
> >>> Let me know if you need more info, thanks.
> >>>
> >>> [1]
> >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420
> >>> -n testnqn Failed to write to /dev/nvme-fabrics: Bad address
> >>>
> >>> [root@rdma-perf-07 ~]$ dmesg
> >>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15 status 0x5 

Devesh, Selvin,

What does this error mean?
bnxt_qplib_create_ah() is failing.

> >>> [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> >>> 476.332525] nvme nvme2: rdma_connect failed (-14).
> >>> [  476.343552] nvme nvme2: rdma connection establishment failed
> >>> (-14)
> >>>
> >>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> >>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> >>> NetXtreme
> >>> BCM5720 2-port Gigabit Ethernet PCIe
> >>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> >>> NetXtreme
> >>> BCM5720 2-port Gigabit Ethernet PCIe
> >>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008
> >>> [Fury] (rev
> >>> 02)
> >>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> >>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> >>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> >>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> >>>
> >>>
> >>> [2]
> >>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> >>> Author: Parav Pandit <parav@mellanox.com>
> >>> Date:   Wed Apr 10 11:23:03 2019 +0300
> >>>
> >>>      IB/core: Allow vlan link local address based RoCE GIDs
> >>>
> >>>      IPv6 link local address for a VLAN netdevice has nothing to do with its
> >>>      resemblance with the default GID, because VLAN link local GID is in
> >>>      different layer 2 domain.
> >>>
> >>>      Now that RoCE MAD packet processing and route resolution consider
> the
> >>>      right GID index, there is no need for an unnecessary check which
> prevents
> >>>      the addition of vlan based IPv6 link local GIDs.
> >>>
> >>>      Signed-off-by: Parav Pandit <parav@mellanox.com>
> >>>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> >>>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >>>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >>>
> >>>
> >>>
> >>> Best Regards,
> >>>    Yi Zhang
> >>>
> >> I need some more information from you to debug this issue as I don’t
> >> have the hw.
> >> The highlighted patch added support for IPv6 link local address for
> >> vlan. I am unsure how this can affect IPv4 AH creation for which there is
> failure.
> >>
> >> 1. Before you assign the IP address to the netdevice, Please do, echo
> >> -n "module ib_core +p" > /sys/kernel/debug/dynamic_debug/control
> >>
> >> Please share below output before doing nvme connect.
> >> 2. Output of script [1]
> >> $ show_gids script
> >> If getting this script is problematic, share the output of,
> >>
> >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> >> $ ip link show
> >> $ip addr show
> >> $ dmesg
> >>
> >> [1] https://community.mellanox.com/s/article/understanding-show-gids-
> >> script#jive_content_id_The_Script
> >>
> >> I suspect that driver's assumption about GID indices might have gone
> >> wrong here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> >> Lets see about results to confirm that.
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  2:49         ` Parav Pandit
@ 2019-07-12  3:45           ` Selvin Xavier
  2019-07-12  9:28             ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-07-12  3:45 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yi Zhang, linux-nvme, Daniel Jurgens, linux-rdma, Devesh Sharma

On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@mellanox.com> wrote:
>
> Hi Yi Zhang,
>
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Yi Zhang
> > Sent: Friday, July 12, 2019 7:23 AM
> > To: Parav Pandit <parav@mellanox.com>; linux-nvme@lists.infradead.org
> > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>;
> > selvin.xavier@broadcom.com
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> >
> > Hi Parav
> >
> > Here is the info, let me know if it's enough, thanks.
> >
> > [root@rdma-perf-07 ~]$ echo -n "module ib_core +p" >
> > /sys/kernel/debug/dynamic_debug/control
> > [root@rdma-perf-07 ~]$ ifdown bnxt_roce
> > Device 'bnxt_roce' successfully disconnected.
> > [root@rdma-perf-07 ~]$ ifup bnxt_roce
> > Connection successfully activated (D-Bus active path:
> > /org/freedesktop/NetworkManager/ActiveConnection/16)
> > [root@rdma-perf-07 ~]$ sh a.sh
> > DEV    PORT    INDEX    GID                    IPv4         VER DEV
> > ---    ----    -----    ---                    ------------ ---    ---
> > bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v1    bnxt_roce
> > bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v2    bnxt_roce
> > bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > 172.31.43.187     v1    bnxt_roce.43
> > bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > 172.31.43.187     v2    bnxt_roce.43
> > bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v1    bnxt_roce.45
> > bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v2    bnxt_roce.45
> > bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v1    bnxt_roce.43
> > bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > v2    bnxt_roce.43
> > bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > 172.31.40.187     v1    bnxt_roce
> > bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > 172.31.40.187     v2    bnxt_roce
> > bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > 172.31.45.187     v1    bnxt_roce.45
> > bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > 172.31.45.187     v2    bnxt_roce.45
> > bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > v1    lom_2
> > bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > v2    lom_2
> > cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
> > cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
> > hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
> > hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
> > mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
> > n_gids_found=19
> >
> > [root@rdma-perf-07 ~]$ dmesg | tail -15
> > [   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link
> > becomes ready [   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE):
> > mlx5_ib0.8004: link becomes ready [   20.010469] hfi1 0000:d8:00.0: hfi1_0:
> > Switching to NO_DMA_RTAIL [   20.440580] IPv6:
> > ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
> > [   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > requested: 8. Max supported is 2.
> > [   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > requested: 8. Max supported is 2.
> > [   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link becomes
> > ready [  211.407329] _ib_cache_gid_del: can't delete gid
> > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.407334]
> > _ib_cache_gid_del: can't delete gid
> > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.425275] infiniband
> > bnxt_re0: del_gid port=1 index=6 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > [  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1 index=6 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > [  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > [  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1 index=7 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > [  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > [  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7 gid
> > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> >
> >
> GID table looks fine.
>
The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry
repeated 6 times. 2 for each interface
bnxt_roce, bnxt_roce.43 and bnxt_roce.45. Is this expected to have
same gid entries for vlan and base interfaces? As you mentioned
earlier, driver's assumption
that only 2 GID entries identical (one for RoCE v1 and one for RoCE
v2)   is breaking here.

> > On 7/12/19 12:18 AM, Parav Pandit wrote:
> > > Sagi,
> > >
> > > This is better one to cc to linux-rdma.
> > >
> > > + Devesh, Selvin.
> > >
> > >> -----Original Message-----
> > >> From: Parav Pandit
> > >> Sent: Thursday, July 11, 2019 6:25 PM
> > >> To: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org
> > >> Cc: Daniel Jurgens <danielj@mellanox.com>
> > >> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> > >>
> > >> Hi Yi Zhang,
> > >>
> > >>> -----Original Message-----
> > >>> From: Yi Zhang <yi.zhang@redhat.com>
> > >>> Sent: Thursday, July 11, 2019 3:17 PM
> > >>> To: linux-nvme@lists.infradead.org
> > >>> Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
> > >>> <parav@mellanox.com>
> > >>> Subject: regression: nvme rdma with bnxt_re0 broken
> > >>>
> > >>> Hello
> > >>>
> > >>> 'nvme connect' failed when use bnxt_re0 on latest upstream build[1],
> > >>> by bisecting I found it was introduced from v5.2.0-rc1 with [2], it
> > >>> works after I revert it.
> > >>> Let me know if you need more info, thanks.
> > >>>
> > >>> [1]
> > >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420
> > >>> -n testnqn Failed to write to /dev/nvme-fabrics: Bad address
> > >>>
> > >>> [root@rdma-perf-07 ~]$ dmesg
> > >>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15 status 0x5
>
> Devesh, Selvin,
>
> What does this error mean?
> bnxt_qplib_create_ah() is failing.
>
We are passing a wrong index for the GID to FW because of the
assumption mentioned earlier.
FW is failing command due to this.

> > >>> [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> > >>> 476.332525] nvme nvme2: rdma_connect failed (-14).
> > >>> [  476.343552] nvme nvme2: rdma connection establishment failed
> > >>> (-14)
> > >>>
> > >>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> > >>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > >>> NetXtreme
> > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > >>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > >>> NetXtreme
> > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > >>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008
> > >>> [Fury] (rev
> > >>> 02)
> > >>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> > >>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > >>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412
> > >>> NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > >>>
> > >>>
> > >>> [2]
> > >>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> > >>> Author: Parav Pandit <parav@mellanox.com>
> > >>> Date:   Wed Apr 10 11:23:03 2019 +0300
> > >>>
> > >>>      IB/core: Allow vlan link local address based RoCE GIDs
> > >>>
> > >>>      IPv6 link local address for a VLAN netdevice has nothing to do with its
> > >>>      resemblance with the default GID, because VLAN link local GID is in
> > >>>      different layer 2 domain.
> > >>>
> > >>>      Now that RoCE MAD packet processing and route resolution consider
> > the
> > >>>      right GID index, there is no need for an unnecessary check which
> > prevents
> > >>>      the addition of vlan based IPv6 link local GIDs.
> > >>>
> > >>>      Signed-off-by: Parav Pandit <parav@mellanox.com>
> > >>>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> > >>>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > >>>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > >>>
> > >>>
> > >>>
> > >>> Best Regards,
> > >>>    Yi Zhang
> > >>>
> > >> I need some more information from you to debug this issue as I don’t
> > >> have the hw.
> > >> The highlighted patch added support for IPv6 link local address for
> > >> vlan. I am unsure how this can affect IPv4 AH creation for which there is
> > failure.
> > >>
> > >> 1. Before you assign the IP address to the netdevice, Please do, echo
> > >> -n "module ib_core +p" > /sys/kernel/debug/dynamic_debug/control
> > >>
> > >> Please share below output before doing nvme connect.
> > >> 2. Output of script [1]
> > >> $ show_gids script
> > >> If getting this script is problematic, share the output of,
> > >>
> > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> > >> $ ip link show
> > >> $ip addr show
> > >> $ dmesg
> > >>
> > >> [1] https://community.mellanox.com/s/article/understanding-show-gids-
> > >> script#jive_content_id_The_Script
> > >>
> > >> I suspect that driver's assumption about GID indices might have gone
> > >> wrong here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> > >> Lets see about results to confirm that.
> > > _______________________________________________
> > > Linux-nvme mailing list
> > > Linux-nvme@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  3:45           ` Selvin Xavier
@ 2019-07-12  9:28             ` Parav Pandit
  2019-07-12  9:39               ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: Parav Pandit @ 2019-07-12  9:28 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Yi Zhang, linux-nvme, Daniel Jurgens, linux-rdma, Devesh Sharma

Hi Selvin,

> -----Original Message-----
> From: Selvin Xavier <selvin.xavier@broadcom.com>
> Sent: Friday, July 12, 2019 9:16 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org; Daniel
> Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org; Devesh
> Sharma <devesh.sharma@broadcom.com>
> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> 
> On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@mellanox.com> wrote:
> >
> > Hi Yi Zhang,
> >
> > > -----Original Message-----
> > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > owner@vger.kernel.org> On Behalf Of Yi Zhang
> > > Sent: Friday, July 12, 2019 7:23 AM
> > > To: Parav Pandit <parav@mellanox.com>;
> > > linux-nvme@lists.infradead.org
> > > Cc: Daniel Jurgens <danielj@mellanox.com>;
> > > linux-rdma@vger.kernel.org; Devesh Sharma
> > > <devesh.sharma@broadcom.com>; selvin.xavier@broadcom.com
> > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > >
> > > Hi Parav
> > >
> > > Here is the info, let me know if it's enough, thanks.
> > >
> > > [root@rdma-perf-07 ~]$ echo -n "module ib_core +p" >
> > > /sys/kernel/debug/dynamic_debug/control
> > > [root@rdma-perf-07 ~]$ ifdown bnxt_roce Device 'bnxt_roce'
> > > successfully disconnected.
> > > [root@rdma-perf-07 ~]$ ifup bnxt_roce Connection successfully
> > > activated (D-Bus active path:
> > > /org/freedesktop/NetworkManager/ActiveConnection/16)
> > > [root@rdma-perf-07 ~]$ sh a.sh
> > > DEV    PORT    INDEX    GID                    IPv4         VER DEV
> > > ---    ----    -----    ---                    ------------ ---    ---
> > > bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce
> > > bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce
> > > bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > 172.31.43.187     v1    bnxt_roce.43
> > > bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > 172.31.43.187     v2    bnxt_roce.43
> > > bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce.45
> > > bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce.45
> > > bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce.43
> > > bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce.43
> > > bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > 172.31.40.187     v1    bnxt_roce
> > > bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > 172.31.40.187     v2    bnxt_roce
> > > bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > 172.31.45.187     v1    bnxt_roce.45
> > > bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > 172.31.45.187     v2    bnxt_roce.45
> > > bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > v1    lom_2
> > > bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > v2    lom_2
> > > cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
> > > cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
> > > hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
> > > hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
> > > mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
> > > n_gids_found=19
> > >
> > > [root@rdma-perf-07 ~]$ dmesg | tail -15
> > > [   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link
> > > becomes ready [   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE):
> > > mlx5_ib0.8004: link becomes ready [   20.010469] hfi1 0000:d8:00.0: hfi1_0:
> > > Switching to NO_DMA_RTAIL [   20.440580] IPv6:
> > > ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
> > > [   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > requested: 8. Max supported is 2.
> > > [   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > requested: 8. Max supported is 2.
> > > [   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link
> becomes
> > > ready [  211.407329] _ib_cache_gid_del: can't delete gid
> > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.407334]
> > > _ib_cache_gid_del: can't delete gid
> > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.425275]
> > > infiniband
> > > bnxt_re0: del_gid port=1 index=6 gid
> > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > index=6 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid
> > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > index=7 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6
> > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7
> > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > >
> > >
> > GID table looks fine.
> >
> The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry repeated 6
> times. 2 for each interface bnxt_roce, bnxt_roce.43 and bnxt_roce.45. Is this
> expected to have same gid entries for vlan and base interfaces? As you
> mentioned earlier, driver's assumption that only 2 GID entries identical (one for
> RoCE v1 and one for RoCE
> v2)   is breaking here.
> 
Yes, this is correct behavior. Each vlan netdev interface is in different L2 segment.
Vlan netdev has this ipv6 link local address. Hence, it is added to the GID table.
A given GID table entry is identified uniquely by GID+ndev+type(v1/v2).

Reviewing bnxt_qplib_add_sgid() does the comparison below.
if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {

This comparison looks incomplete which ignore netdev and type.
But even with that, I would expect GID entry addition failure for vlans for ipv6 link local entries.

But I am puzzled now, that , with above memcmp() check, how does both v1 and v2 entries get added in your table and for vlans too?
I expect add_gid() and core/cache.c add_roce_gid () to fail for the duplicate entry.
But GID table that Yi Zhang dumped has these vlan entries.
I am missing something.

Yi Zhang,
Instead of last 15 lines of dmesg, can you please share the whole dmsg log which should be enabled before creating vlans.
using
echo -n "module ib_core +p" /sys/kernel/debug/dynamic_debug/control

Selvin,
Additionally, driver shouldn't be looking at the duplicate entries. core already does it.
You might only want to do for v1/v2 case as bnxt driver has some dependency with it.
Can you please fix this part?

> > > On 7/12/19 12:18 AM, Parav Pandit wrote:
> > > > Sagi,
> > > >
> > > > This is better one to cc to linux-rdma.
> > > >
> > > > + Devesh, Selvin.
> > > >
> > > >> -----Original Message-----
> > > >> From: Parav Pandit
> > > >> Sent: Thursday, July 11, 2019 6:25 PM
> > > >> To: Yi Zhang <yi.zhang@redhat.com>;
> > > >> linux-nvme@lists.infradead.org
> > > >> Cc: Daniel Jurgens <danielj@mellanox.com>
> > > >> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> > > >>
> > > >> Hi Yi Zhang,
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Yi Zhang <yi.zhang@redhat.com>
> > > >>> Sent: Thursday, July 11, 2019 3:17 PM
> > > >>> To: linux-nvme@lists.infradead.org
> > > >>> Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
> > > >>> <parav@mellanox.com>
> > > >>> Subject: regression: nvme rdma with bnxt_re0 broken
> > > >>>
> > > >>> Hello
> > > >>>
> > > >>> 'nvme connect' failed when use bnxt_re0 on latest upstream
> > > >>> build[1], by bisecting I found it was introduced from v5.2.0-rc1
> > > >>> with [2], it works after I revert it.
> > > >>> Let me know if you need more info, thanks.
> > > >>>
> > > >>> [1]
> > > >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s
> > > >>> 4420 -n testnqn Failed to write to /dev/nvme-fabrics: Bad
> > > >>> address
> > > >>>
> > > >>> [root@rdma-perf-07 ~]$ dmesg
> > > >>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15
> > > >>> status 0x5
> >
> > Devesh, Selvin,
> >
> > What does this error mean?
> > bnxt_qplib_create_ah() is failing.
> >
> We are passing a wrong index for the GID to FW because of the assumption
> mentioned earlier.
> FW is failing command due to this.
> 
> > > >>> [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> > > >>> 476.332525] nvme nvme2: rdma_connect failed (-14).
> > > >>> [  476.343552] nvme nvme2: rdma connection establishment failed
> > > >>> (-14)
> > > >>>
> > > >>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> > > >>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> NetXtreme
> > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > >>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> NetXtreme
> > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > >>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008
> > > >>> [Fury] (rev
> > > >>> 02)
> > > >>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > >>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > >>>
> > > >>>
> > > >>> [2]
> > > >>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> > > >>> Author: Parav Pandit <parav@mellanox.com>
> > > >>> Date:   Wed Apr 10 11:23:03 2019 +0300
> > > >>>
> > > >>>      IB/core: Allow vlan link local address based RoCE GIDs
> > > >>>
> > > >>>      IPv6 link local address for a VLAN netdevice has nothing to do with
> its
> > > >>>      resemblance with the default GID, because VLAN link local GID is in
> > > >>>      different layer 2 domain.
> > > >>>
> > > >>>      Now that RoCE MAD packet processing and route resolution
> > > >>> consider
> > > the
> > > >>>      right GID index, there is no need for an unnecessary check
> > > >>> which
> > > prevents
> > > >>>      the addition of vlan based IPv6 link local GIDs.
> > > >>>
> > > >>>      Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > >>>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> > > >>>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > >>>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Best Regards,
> > > >>>    Yi Zhang
> > > >>>
> > > >> I need some more information from you to debug this issue as I
> > > >> don’t have the hw.
> > > >> The highlighted patch added support for IPv6 link local address
> > > >> for vlan. I am unsure how this can affect IPv4 AH creation for
> > > >> which there is
> > > failure.
> > > >>
> > > >> 1. Before you assign the IP address to the netdevice, Please do,
> > > >> echo -n "module ib_core +p" >
> > > >> /sys/kernel/debug/dynamic_debug/control
> > > >>
> > > >> Please share below output before doing nvme connect.
> > > >> 2. Output of script [1]
> > > >> $ show_gids script
> > > >> If getting this script is problematic, share the output of,
> > > >>
> > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> > > >> $ ip link show
> > > >> $ip addr show
> > > >> $ dmesg
> > > >>
> > > >> [1]
> > > >> https://community.mellanox.com/s/article/understanding-show-gids-
> > > >> script#jive_content_id_The_Script
> > > >>
> > > >> I suspect that driver's assumption about GID indices might have
> > > >> gone wrong here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> > > >> Lets see about results to confirm that.
> > > > _______________________________________________
> > > > Linux-nvme mailing list
> > > > Linux-nvme@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  9:28             ` Parav Pandit
@ 2019-07-12  9:39               ` Parav Pandit
  2019-07-12  9:49                 ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: Parav Pandit @ 2019-07-12  9:39 UTC (permalink / raw)
  To: Parav Pandit, Selvin Xavier
  Cc: Yi Zhang, linux-nvme, Daniel Jurgens, linux-rdma, Devesh Sharma



> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Friday, July 12, 2019 2:58 PM
> To: Selvin Xavier <selvin.xavier@broadcom.com>
> Cc: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org; Daniel
> Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org; Devesh
> Sharma <devesh.sharma@broadcom.com>
> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> 
> Hi Selvin,
> 
> > -----Original Message-----
> > From: Selvin Xavier <selvin.xavier@broadcom.com>
> > Sent: Friday, July 12, 2019 9:16 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org;
> > Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> >
> > On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@mellanox.com> wrote:
> > >
> > > Hi Yi Zhang,
> > >
> > > > -----Original Message-----
> > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > > > owner@vger.kernel.org> On Behalf Of Yi Zhang
> > > > Sent: Friday, July 12, 2019 7:23 AM
> > > > To: Parav Pandit <parav@mellanox.com>;
> > > > linux-nvme@lists.infradead.org
> > > > Cc: Daniel Jurgens <danielj@mellanox.com>;
> > > > linux-rdma@vger.kernel.org; Devesh Sharma
> > > > <devesh.sharma@broadcom.com>; selvin.xavier@broadcom.com
> > > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > > >
> > > > Hi Parav
> > > >
> > > > Here is the info, let me know if it's enough, thanks.
> > > >
> > > > [root@rdma-perf-07 ~]$ echo -n "module ib_core +p" >
> > > > /sys/kernel/debug/dynamic_debug/control
> > > > [root@rdma-perf-07 ~]$ ifdown bnxt_roce Device 'bnxt_roce'
> > > > successfully disconnected.
> > > > [root@rdma-perf-07 ~]$ ifup bnxt_roce Connection successfully
> > > > activated (D-Bus active path:
> > > > /org/freedesktop/NetworkManager/ActiveConnection/16)
> > > > [root@rdma-perf-07 ~]$ sh a.sh
> > > > DEV    PORT    INDEX    GID                    IPv4         VER DEV
> > > > ---    ----    -----    ---                    ------------ ---    ---
> > > > bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v1    bnxt_roce
> > > > bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v2    bnxt_roce
> > > > bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > > 172.31.43.187     v1    bnxt_roce.43
> > > > bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > > 172.31.43.187     v2    bnxt_roce.43
> > > > bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v1    bnxt_roce.45
> > > > bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v2    bnxt_roce.45
> > > > bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v1    bnxt_roce.43
> > > > bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > > v2    bnxt_roce.43
> > > > bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > 172.31.40.187     v1    bnxt_roce
> > > > bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > 172.31.40.187     v2    bnxt_roce
> > > > bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > > 172.31.45.187     v1    bnxt_roce.45
> > > > bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > > 172.31.45.187     v2    bnxt_roce.45
> > > > bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > > v1    lom_2
> > > > bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > > v2    lom_2
> > > > cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
> > > > cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
> > > > hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
> > > > hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
> > > > mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
> > > > n_gids_found=19
> > > >
> > > > [root@rdma-perf-07 ~]$ dmesg | tail -15
> > > > [   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link
> > > > becomes ready [   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE):
> > > > mlx5_ib0.8004: link becomes ready [   20.010469] hfi1 0000:d8:00.0:
> hfi1_0:
> > > > Switching to NO_DMA_RTAIL [   20.440580] IPv6:
> > > > ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
> > > > [   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > > requested: 8. Max supported is 2.
> > > > [   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > > requested: 8. Max supported is 2.
> > > > [   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link
> > becomes
> > > > ready [  211.407329] _ib_cache_gid_del: can't delete gid
> > > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.407334]
> > > > _ib_cache_gid_del: can't delete gid
> > > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.425275]
> > > > infiniband
> > > > bnxt_re0: del_gid port=1 index=6 gid
> > > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > [  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > > index=6 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > [  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid
> > > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > [  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > > index=7 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > [  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6
> > > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > > [  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7
> > > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > >
> > > >
> > > GID table looks fine.
> > >
> > The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry
> > repeated 6 times. 2 for each interface bnxt_roce, bnxt_roce.43 and
> > bnxt_roce.45. Is this expected to have same gid entries for vlan and
> > base interfaces? As you mentioned earlier, driver's assumption that
> > only 2 GID entries identical (one for RoCE v1 and one for RoCE
> > v2)   is breaking here.
> >
> Yes, this is correct behavior. Each vlan netdev interface is in different L2
> segment.
> Vlan netdev has this ipv6 link local address. Hence, it is added to the GID table.
> A given GID table entry is identified uniquely by GID+ndev+type(v1/v2).
> 
> Reviewing bnxt_qplib_add_sgid() does the comparison below.
> if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> 
> This comparison looks incomplete which ignore netdev and type.
> But even with that, I would expect GID entry addition failure for vlans for ipv6
> link local entries.
> 
> But I am puzzled now, that , with above memcmp() check, how does both v1
> and v2 entries get added in your table and for vlans too?
> I expect add_gid() and core/cache.c add_roce_gid () to fail for the duplicate
> entry.
> But GID table that Yi Zhang dumped has these vlan entries.
> I am missing something.
> 
Ah, found it.
bnxt_re_add_gid() checks for -EALREADY and returns 0 to add_gid() callback.
Ok. so you just need to extend bnxt_qplib_add_sgid() for considering vlan too.
Let me see if I can share a patch in few minutes.

> Yi Zhang,
> Instead of last 15 lines of dmesg, can you please share the whole dmsg log
> which should be enabled before creating vlans.
> using
> echo -n "module ib_core +p" /sys/kernel/debug/dynamic_debug/control
> 
> Selvin,
> Additionally, driver shouldn't be looking at the duplicate entries. core already
> does it.
> You might only want to do for v1/v2 case as bnxt driver has some dependency
> with it.
> Can you please fix this part?
> 
> > > > On 7/12/19 12:18 AM, Parav Pandit wrote:
> > > > > Sagi,
> > > > >
> > > > > This is better one to cc to linux-rdma.
> > > > >
> > > > > + Devesh, Selvin.
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Parav Pandit
> > > > >> Sent: Thursday, July 11, 2019 6:25 PM
> > > > >> To: Yi Zhang <yi.zhang@redhat.com>;
> > > > >> linux-nvme@lists.infradead.org
> > > > >> Cc: Daniel Jurgens <danielj@mellanox.com>
> > > > >> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> > > > >>
> > > > >> Hi Yi Zhang,
> > > > >>
> > > > >>> -----Original Message-----
> > > > >>> From: Yi Zhang <yi.zhang@redhat.com>
> > > > >>> Sent: Thursday, July 11, 2019 3:17 PM
> > > > >>> To: linux-nvme@lists.infradead.org
> > > > >>> Cc: Daniel Jurgens <danielj@mellanox.com>; Parav Pandit
> > > > >>> <parav@mellanox.com>
> > > > >>> Subject: regression: nvme rdma with bnxt_re0 broken
> > > > >>>
> > > > >>> Hello
> > > > >>>
> > > > >>> 'nvme connect' failed when use bnxt_re0 on latest upstream
> > > > >>> build[1], by bisecting I found it was introduced from
> > > > >>> v5.2.0-rc1 with [2], it works after I revert it.
> > > > >>> Let me know if you need more info, thanks.
> > > > >>>
> > > > >>> [1]
> > > > >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125
> > > > >>> -s
> > > > >>> 4420 -n testnqn Failed to write to /dev/nvme-fabrics: Bad
> > > > >>> address
> > > > >>>
> > > > >>> [root@rdma-perf-07 ~]$ dmesg
> > > > >>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15
> > > > >>> status 0x5
> > >
> > > Devesh, Selvin,
> > >
> > > What does this error mean?
> > > bnxt_qplib_create_ah() is failing.
> > >
> > We are passing a wrong index for the GID to FW because of the
> > assumption mentioned earlier.
> > FW is failing command due to this.
> >
> > > > >>> [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> > > > >>> 476.332525] nvme nvme2: rdma_connect failed (-14).
> > > > >>> [  476.343552] nvme nvme2: rdma connection establishment
> > > > >>> failed
> > > > >>> (-14)
> > > > >>>
> > > > >>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> > > > >>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > > >>> NetXtreme
> > > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > > >>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > > >>> NetXtreme
> > > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > > >>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3
> > > > >>> 3008 [Fury] (rev
> > > > >>> 02)
> > > > >>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > > >>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > > >>>
> > > > >>>
> > > > >>> [2]
> > > > >>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> > > > >>> Author: Parav Pandit <parav@mellanox.com>
> > > > >>> Date:   Wed Apr 10 11:23:03 2019 +0300
> > > > >>>
> > > > >>>      IB/core: Allow vlan link local address based RoCE GIDs
> > > > >>>
> > > > >>>      IPv6 link local address for a VLAN netdevice has nothing
> > > > >>> to do with
> > its
> > > > >>>      resemblance with the default GID, because VLAN link local GID is
> in
> > > > >>>      different layer 2 domain.
> > > > >>>
> > > > >>>      Now that RoCE MAD packet processing and route resolution
> > > > >>> consider
> > > > the
> > > > >>>      right GID index, there is no need for an unnecessary
> > > > >>> check which
> > > > prevents
> > > > >>>      the addition of vlan based IPv6 link local GIDs.
> > > > >>>
> > > > >>>      Signed-off-by: Parav Pandit <parav@mellanox.com>
> > > > >>>      Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
> > > > >>>      Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > > >>>      Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Best Regards,
> > > > >>>    Yi Zhang
> > > > >>>
> > > > >> I need some more information from you to debug this issue as I
> > > > >> don’t have the hw.
> > > > >> The highlighted patch added support for IPv6 link local address
> > > > >> for vlan. I am unsure how this can affect IPv4 AH creation for
> > > > >> which there is
> > > > failure.
> > > > >>
> > > > >> 1. Before you assign the IP address to the netdevice, Please
> > > > >> do, echo -n "module ib_core +p" >
> > > > >> /sys/kernel/debug/dynamic_debug/control
> > > > >>
> > > > >> Please share below output before doing nvme connect.
> > > > >> 2. Output of script [1]
> > > > >> $ show_gids script
> > > > >> If getting this script is problematic, share the output of,
> > > > >>
> > > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> > > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> > > > >> $ ip link show
> > > > >> $ip addr show
> > > > >> $ dmesg
> > > > >>
> > > > >> [1]
> > > > >> https://community.mellanox.com/s/article/understanding-show-gid
> > > > >> s- script#jive_content_id_The_Script
> > > > >>
> > > > >> I suspect that driver's assumption about GID indices might have
> > > > >> gone wrong here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> > > > >> Lets see about results to confirm that.
> > > > > _______________________________________________
> > > > > Linux-nvme mailing list
> > > > > Linux-nvme@lists.infradead.org
> > > > > http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  9:39               ` Parav Pandit
@ 2019-07-12  9:49                 ` Parav Pandit
  2019-07-12 11:41                   ` Yi Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Parav Pandit @ 2019-07-12  9:49 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Yi Zhang, linux-nvme, Daniel Jurgens, linux-rdma, Devesh Sharma


> > Hi Selvin,
> >
> > > -----Original Message-----
> > > From: Selvin Xavier <selvin.xavier@broadcom.com>
> > > Sent: Friday, July 12, 2019 9:16 AM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org;
> > > Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > > Devesh Sharma <devesh.sharma@broadcom.com>
> > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > >
> > > On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@mellanox.com>
> wrote:
> > > >
> > > > GID table looks fine.
> > > >
> > > The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry
> > > repeated 6 times. 2 for each interface bnxt_roce, bnxt_roce.43 and
> > > bnxt_roce.45. Is this expected to have same gid entries for vlan and
> > > base interfaces? As you mentioned earlier, driver's assumption that
> > > only 2 GID entries identical (one for RoCE v1 and one for RoCE
> > > v2)   is breaking here.
> > >
> > Yes, this is correct behavior. Each vlan netdev interface is in
> > different L2 segment.
> > Vlan netdev has this ipv6 link local address. Hence, it is added to the GID
> table.
> > A given GID table entry is identified uniquely by GID+ndev+type(v1/v2).
> >
> > Reviewing bnxt_qplib_add_sgid() does the comparison below.
> > if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> >
> > This comparison looks incomplete which ignore netdev and type.
> > But even with that, I would expect GID entry addition failure for
> > vlans for ipv6 link local entries.
> >
> > But I am puzzled now, that , with above memcmp() check, how does both
> > v1 and v2 entries get added in your table and for vlans too?
> > I expect add_gid() and core/cache.c add_roce_gid () to fail for the
> > duplicate entry.
> > But GID table that Yi Zhang dumped has these vlan entries.
> > I am missing something.
> >
> Ah, found it.
> bnxt_re_add_gid() checks for -EALREADY and returns 0 to add_gid() callback.
> Ok. so you just need to extend bnxt_qplib_add_sgid() for considering vlan too.
> Let me see if I can share a patch in few minutes.
> 
> > Yi Zhang,
> > Instead of last 15 lines of dmesg, can you please share the whole dmsg
> > log which should be enabled before creating vlans.
> > using
> > echo -n "module ib_core +p" /sys/kernel/debug/dynamic_debug/control
> >
> > Selvin,
> > Additionally, driver shouldn't be looking at the duplicate entries.
> > core already does it.
> > You might only want to do for v1/v2 case as bnxt driver has some
> > dependency with it.
> > Can you please fix this part?
> >

How about below fix?

From f3f17008d34b5a0c38c190010281a3030a8e5771 Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Fri, 12 Jul 2019 04:34:52 -0500
Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparision

GID entry consist of GID, vlan, netdev and smac.
Extend GID duplicate check comparions to consider vlan_id as well
to support IPv6 VLAN based link local addresses.

Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_sp.c | 4 +++-
 drivers/infiniband/hw/bnxt_re/qplib_sp.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 48793d3512ac..8567b7367669 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -296,7 +296,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	free_idx = sgid_tbl->max;
 	for (i = 0; i < sgid_tbl->max; i++) {
-		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
+		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
+		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
 			dev_dbg(&res->pdev->dev,
 				"SGID entry already exist in entry %d!\n", i);
 			*index = i;
@@ -351,6 +352,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	/* Add GID to the sgid_tbl */
 	memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
+	sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
 	sgid_tbl->active++;
 	if (vlan_id != 0xFFFF)
 		sgid_tbl->vlan[free_idx] = 1;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index 0ec3b12b0bcd..7a1957c9dc6f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -82,6 +82,7 @@ struct bnxt_qplib_pd {
 
 struct bnxt_qplib_gid {
 	u8				data[16];
+	u16 vlan_id;
 };
 
 struct bnxt_qplib_ah {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12  9:49                 ` Parav Pandit
@ 2019-07-12 11:41                   ` Yi Zhang
  2019-07-12 12:52                     ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: Yi Zhang @ 2019-07-12 11:41 UTC (permalink / raw)
  To: Parav Pandit, Selvin Xavier
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, linux-nvme

Hi Parav
The nvme connect still failed[1], I've paste all the dmesg log to[2], 
pls check it.


[1]
[root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n 
testnqn
Failed to write to /dev/nvme-fabrics: Connection reset by peer
[2]
https://pastebin.com/ipvak0Sp

Thanks

Yi


On 7/12/19 5:49 PM, Parav Pandit wrote:
>>> Hi Selvin,
>>>
>>>> -----Original Message-----
>>>> From: Selvin Xavier <selvin.xavier@broadcom.com>
>>>> Sent: Friday, July 12, 2019 9:16 AM
>>>> To: Parav Pandit <parav@mellanox.com>
>>>> Cc: Yi Zhang <yi.zhang@redhat.com>; linux-nvme@lists.infradead.org;
>>>> Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
>>>> Devesh Sharma <devesh.sharma@broadcom.com>
>>>> Subject: Re: regression: nvme rdma with bnxt_re0 broken
>>>>
>>>> On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@mellanox.com>
>> wrote:
>>>>> GID table looks fine.
>>>>>
>>>> The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry
>>>> repeated 6 times. 2 for each interface bnxt_roce, bnxt_roce.43 and
>>>> bnxt_roce.45. Is this expected to have same gid entries for vlan and
>>>> base interfaces? As you mentioned earlier, driver's assumption that
>>>> only 2 GID entries identical (one for RoCE v1 and one for RoCE
>>>> v2)   is breaking here.
>>>>
>>> Yes, this is correct behavior. Each vlan netdev interface is in
>>> different L2 segment.
>>> Vlan netdev has this ipv6 link local address. Hence, it is added to the GID
>> table.
>>> A given GID table entry is identified uniquely by GID+ndev+type(v1/v2).
>>>
>>> Reviewing bnxt_qplib_add_sgid() does the comparison below.
>>> if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
>>>
>>> This comparison looks incomplete which ignore netdev and type.
>>> But even with that, I would expect GID entry addition failure for
>>> vlans for ipv6 link local entries.
>>>
>>> But I am puzzled now, that , with above memcmp() check, how does both
>>> v1 and v2 entries get added in your table and for vlans too?
>>> I expect add_gid() and core/cache.c add_roce_gid () to fail for the
>>> duplicate entry.
>>> But GID table that Yi Zhang dumped has these vlan entries.
>>> I am missing something.
>>>
>> Ah, found it.
>> bnxt_re_add_gid() checks for -EALREADY and returns 0 to add_gid() callback.
>> Ok. so you just need to extend bnxt_qplib_add_sgid() for considering vlan too.
>> Let me see if I can share a patch in few minutes.
>>
>>> Yi Zhang,
>>> Instead of last 15 lines of dmesg, can you please share the whole dmsg
>>> log which should be enabled before creating vlans.
>>> using
>>> echo -n "module ib_core +p" /sys/kernel/debug/dynamic_debug/control
>>>
>>> Selvin,
>>> Additionally, driver shouldn't be looking at the duplicate entries.
>>> core already does it.
>>> You might only want to do for v1/v2 case as bnxt driver has some
>>> dependency with it.
>>> Can you please fix this part?
>>>
> How about below fix?
>
>  From f3f17008d34b5a0c38c190010281a3030a8e5771 Mon Sep 17 00:00:00 2001
> From: Parav Pandit <parav@mellanox.com>
> Date: Fri, 12 Jul 2019 04:34:52 -0500
> Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparision
>
> GID entry consist of GID, vlan, netdev and smac.
> Extend GID duplicate check comparions to consider vlan_id as well
> to support IPv6 VLAN based link local addresses.
>
> Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
>   drivers/infiniband/hw/bnxt_re/qplib_sp.c | 4 +++-
>   drivers/infiniband/hw/bnxt_re/qplib_sp.h | 1 +
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> index 48793d3512ac..8567b7367669 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> @@ -296,7 +296,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>   	}
>   	free_idx = sgid_tbl->max;
>   	for (i = 0; i < sgid_tbl->max; i++) {
> -		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> +		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> +		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
>   			dev_dbg(&res->pdev->dev,
>   				"SGID entry already exist in entry %d!\n", i);
>   			*index = i;
> @@ -351,6 +352,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>   	}
>   	/* Add GID to the sgid_tbl */
>   	memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
> +	sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
>   	sgid_tbl->active++;
>   	if (vlan_id != 0xFFFF)
>   		sgid_tbl->vlan[free_idx] = 1;
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> index 0ec3b12b0bcd..7a1957c9dc6f 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> @@ -82,6 +82,7 @@ struct bnxt_qplib_pd {
>   
>   struct bnxt_qplib_gid {
>   	u8				data[16];
> +	u16 vlan_id;
>   };
>   
>   struct bnxt_qplib_ah {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 11:41                   ` Yi Zhang
@ 2019-07-12 12:52                     ` Parav Pandit
  2019-07-12 15:40                       ` Jason Gunthorpe
  2019-07-12 16:18                       ` Selvin Xavier
  0 siblings, 2 replies; 18+ messages in thread
From: Parav Pandit @ 2019-07-12 12:52 UTC (permalink / raw)
  To: Yi Zhang, Selvin Xavier
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, linux-nvme



> -----Original Message-----
> From: Yi Zhang <yi.zhang@redhat.com>
> Sent: Friday, July 12, 2019 5:11 PM
> To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> <selvin.xavier@broadcom.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> nvme@lists.infradead.org
> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> 
> Hi Parav
> The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> 
> 
> [1]
> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> testnqn
> Failed to write to /dev/nvme-fabrics: Connection reset by peer
> [2]
> https://pastebin.com/ipvak0Sp
> 

I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
This is probably solves it. Not sure. I am not much familiar with it.

Selvin,
Can you please take a look?

From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Fri, 12 Jul 2019 04:34:52 -0500
Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison

GID entry consist of GID, vlan, netdev and smac.
Extend GID duplicate check companions to consider vlan_id as well
to support IPv6 VLAN based link local addresses.

GID deletion based on only GID comparision is not correct.
It needs further fixes.

Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 37928b1111df..216648b640ce 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
 				     struct bnxt_qplib_sgid_tbl *sgid_tbl,
 				     u16 max)
 {
+	u16 i;
+
 	sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
 	if (!sgid_tbl->tbl)
 		return -ENOMEM;
@@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
 	if (!sgid_tbl->ctx)
 		goto out_free2;
 
+	for (i = 0; i < max; i++)
+		sgid_tbl->tbl[i].vlan_id = 0xffff;
+
 	sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
 	if (!sgid_tbl->vlan)
 		goto out_free3;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 48793d3512ac..0d90be88685f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 		return -ENOMEM;
 	}
 	for (index = 0; index < sgid_tbl->max; index++) {
+		/* FIXME: GID delete should happen based on index
+		 * and refcount
+		 */
 		if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
 			break;
 	}
@@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	free_idx = sgid_tbl->max;
 	for (i = 0; i < sgid_tbl->max; i++) {
-		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
+		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
+		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
 			dev_dbg(&res->pdev->dev,
 				"SGID entry already exist in entry %d!\n", i);
 			*index = i;
@@ -351,6 +355,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	/* Add GID to the sgid_tbl */
 	memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
+	sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
 	sgid_tbl->active++;
 	if (vlan_id != 0xFFFF)
 		sgid_tbl->vlan[free_idx] = 1;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index 0ec3b12b0bcd..7a1957c9dc6f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -82,6 +82,7 @@ struct bnxt_qplib_pd {
 
 struct bnxt_qplib_gid {
 	u8				data[16];
+	u16 vlan_id;
 };
 
 struct bnxt_qplib_ah {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 12:52                     ` Parav Pandit
@ 2019-07-12 15:40                       ` Jason Gunthorpe
  2019-07-12 16:29                         ` Selvin Xavier
  2019-07-12 16:18                       ` Selvin Xavier
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2019-07-12 15:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yi Zhang, Selvin Xavier, Daniel Jurgens, linux-rdma,
	Devesh Sharma, linux-nvme

On Fri, Jul 12, 2019 at 12:52:24PM +0000, Parav Pandit wrote:
> 
> 
> > From: Yi Zhang <yi.zhang@redhat.com>
> > Sent: Friday, July 12, 2019 5:11 PM
> > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > <selvin.xavier@broadcom.com>
> > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > nvme@lists.infradead.org
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > 
> > Hi Parav
> > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> > 
> > 
> > [1]
> > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > testnqn
> > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > [2]
> > https://pastebin.com/ipvak0Sp
> > 
> 
> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> This is probably solves it. Not sure. I am not much familiar with it.
> 
> Selvin,
> Can you please take a look?
> 
> From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
> From: Parav Pandit <parav@mellanox.com>
> Date: Fri, 12 Jul 2019 04:34:52 -0500
> Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison
> 
> GID entry consist of GID, vlan, netdev and smac.
> Extend GID duplicate check companions to consider vlan_id as well
> to support IPv6 VLAN based link local addresses.
> 
> GID deletion based on only GID comparision is not correct.
> It needs further fixes.
> 
> Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
> Signed-off-by: Parav Pandit <parav@mellanox.com>
>  drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
>  drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
>  drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> index 37928b1111df..216648b640ce 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> @@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
>  				     struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  				     u16 max)
>  {
> +	u16 i;
> +
>  	sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
>  	if (!sgid_tbl->tbl)
>  		return -ENOMEM;
> @@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
>  	if (!sgid_tbl->ctx)
>  		goto out_free2;
>  
> +	for (i = 0; i < max; i++)
> +		sgid_tbl->tbl[i].vlan_id = 0xffff;
> +
>  	sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
>  	if (!sgid_tbl->vlan)
>  		goto out_free3;
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> index 48793d3512ac..0d90be88685f 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> @@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  		return -ENOMEM;
>  	}
>  	for (index = 0; index < sgid_tbl->max; index++) {
> +		/* FIXME: GID delete should happen based on index
> +		 * and refcount
> +		 */
>  		if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
>  			break;
>  	}
> @@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  	}
>  	free_idx = sgid_tbl->max;
>  	for (i = 0; i < sgid_tbl->max; i++) {
> -		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> +		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> +		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
>  			dev_dbg(&res->pdev->dev,
>  				"SGID entry already exist in entry %d!\n", i);

bnxt guys: please just delete this duplicate detection code from the
driver. Every GID provided from the core must be programmed into the
given gid table index.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 12:52                     ` Parav Pandit
  2019-07-12 15:40                       ` Jason Gunthorpe
@ 2019-07-12 16:18                       ` Selvin Xavier
  2019-07-13  7:56                         ` Yi Zhang
  1 sibling, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-07-12 16:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yi Zhang, Daniel Jurgens, linux-rdma, Devesh Sharma, linux-nvme

On Fri, Jul 12, 2019 at 6:22 PM Parav Pandit <parav@mellanox.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Yi Zhang <yi.zhang@redhat.com>
> > Sent: Friday, July 12, 2019 5:11 PM
> > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > <selvin.xavier@broadcom.com>
> > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > nvme@lists.infradead.org
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> >
> > Hi Parav
> > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> >
> >
> > [1]
> > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > testnqn
> > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > [2]
> > https://pastebin.com/ipvak0Sp
> >
>
> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> This is probably solves it. Not sure. I am not much familiar with it.
>
> Selvin,
> Can you please take a look?
>
Parav,
 Thanks for the patch. I removed the change you made in struct bnxt_qplib_gid
and added a new structure struct bnxt_qplib_gid_info to include both
gid and vlan_id.
struct bnxt_qplib_gid is used in multiple places to memcpy or compare
the 16 bytes.
Also, added vlan checking in the destroy path also. Some more cleanup
possible in
delete_gid path. That can be done once the basic issue is fixed.

Yi, Can you please test this patch and see if it is solving the issue?

Thanks,
Selvin

From 3d83613cfc5993bf9dae529ab0d4d25ddefff29b Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Fri, 12 Jul 2019 10:32:51 -0400
Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparison

GID entry consist of GID, vlan, netdev and smac.
Extend GID duplicate check companions to consider vlan_id as well
to support IPv6 VLAN based link local addresses.

Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  7 +++++--
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 17 +++++++++++++----
 drivers/infiniband/hw/bnxt_re/qplib_res.h |  2 +-
 drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 14 +++++++++-----
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  |  7 ++++++-
 5 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index a91653aabf38..098ab883733e 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -308,6 +308,7 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  struct bnxt_re_dev *rdev = to_bnxt_re_dev(attr->device, ibdev);
  struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
  struct bnxt_qplib_gid *gid_to_del;
+ u16 vlan_id = 0xFFFF;

  /* Delete the entry from the hardware */
  ctx = *context;
@@ -317,7 +318,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  if (sgid_tbl && sgid_tbl->active) {
  if (ctx->idx >= sgid_tbl->max)
  return -EINVAL;
- gid_to_del = &sgid_tbl->tbl[ctx->idx];
+ gid_to_del = &sgid_tbl->tbl[ctx->idx].gid;
+ vlan_id = sgid_tbl->tbl[ctx->idx].vlan_id;
  /* DEL_GID is called in WQ context(netdevice_event_work_handler)
  * or via the ib_unregister_device path. In the former case QP1
  * may not be destroyed yet, in which case just return as FW
@@ -335,7 +337,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  }
  ctx->refcnt--;
  if (!ctx->refcnt) {
- rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del, true);
+ rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del,
+ vlan_id,  true);
  if (rc) {
  dev_err(rdev_to_dev(rdev),
  "Failed to remove GID: %#x", rc);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c
b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 37928b1111df..7f2571f7a13f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -488,7 +488,10 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
bnxt_qplib_res *res,
       struct bnxt_qplib_sgid_tbl *sgid_tbl,
       u16 max)
 {
- sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
+ u16 i;
+
+ sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid_info),
+ GFP_KERNEL);
  if (!sgid_tbl->tbl)
  return -ENOMEM;

@@ -500,6 +503,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
bnxt_qplib_res *res,
  if (!sgid_tbl->ctx)
  goto out_free2;

+ for (i = 0; i < max; i++)
+ sgid_tbl->tbl[i].vlan_id = 0xffff;
+
  sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
  if (!sgid_tbl->vlan)
  goto out_free3;
@@ -526,9 +532,11 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
bnxt_qplib_res *res,
  for (i = 0; i < sgid_tbl->max; i++) {
  if (memcmp(&sgid_tbl->tbl[i], &bnxt_qplib_gid_zero,
     sizeof(bnxt_qplib_gid_zero)))
- bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i], true);
+ bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i].gid,
+     sgid_tbl->tbl[i].vlan_id, true);
  }
- memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
+ memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
+ sgid_tbl->max);
  memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
  memset(sgid_tbl->vlan, 0, sizeof(u8) * sgid_tbl->max);
  sgid_tbl->active = 0;
@@ -537,7 +545,8 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
bnxt_qplib_res *res,
 static void bnxt_qplib_init_sgid_tbl(struct bnxt_qplib_sgid_tbl *sgid_tbl,
       struct net_device *netdev)
 {
- memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
+ memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
+ sgid_tbl->max);
  memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
 }

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h
b/drivers/infiniband/hw/bnxt_re/qplib_res.h
index 30c42c92fac7..fbda11a7ab1a 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h
@@ -111,7 +111,7 @@ struct bnxt_qplib_pd_tbl {
 };

 struct bnxt_qplib_sgid_tbl {
- struct bnxt_qplib_gid *tbl;
+ struct bnxt_qplib_gid_info *tbl;
  u16 *hw_id;
  u16 max;
  u16 active;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 48793d3512ac..40296b97d21e 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -213,12 +213,12 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
  index, sgid_tbl->max);
  return -EINVAL;
  }
- memcpy(gid, &sgid_tbl->tbl[index], sizeof(*gid));
+ memcpy(gid, &sgid_tbl->tbl[index].gid, sizeof(*gid));
  return 0;
 }

 int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
- struct bnxt_qplib_gid *gid, bool update)
+ struct bnxt_qplib_gid *gid, u16 vlan_id, bool update)
 {
  struct bnxt_qplib_res *res = to_bnxt_qplib(sgid_tbl,
     struct bnxt_qplib_res,
@@ -236,7 +236,8 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  return -ENOMEM;
  }
  for (index = 0; index < sgid_tbl->max; index++) {
- if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
+ if (!memcmp(&sgid_tbl->tbl[index].gid, gid, sizeof(*gid)) &&
+     vlan_id == sgid_tbl->tbl[index].vlan_id)
  break;
  }
  if (index == sgid_tbl->max) {
@@ -262,8 +263,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  if (rc)
  return rc;
  }
- memcpy(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
+ memcpy(&sgid_tbl->tbl[index].gid, &bnxt_qplib_gid_zero,
         sizeof(bnxt_qplib_gid_zero));
+ sgid_tbl->tbl[index].vlan_id = 0xFFFF;
  sgid_tbl->vlan[index] = 0;
  sgid_tbl->active--;
  dev_dbg(&res->pdev->dev,
@@ -296,7 +298,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  }
  free_idx = sgid_tbl->max;
  for (i = 0; i < sgid_tbl->max; i++) {
- if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
+ if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
+     sgid_tbl->tbl[i].vlan_id == vlan_id) {
  dev_dbg(&res->pdev->dev,
  "SGID entry already exist in entry %d!\n", i);
  *index = i;
@@ -351,6 +354,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  }
  /* Add GID to the sgid_tbl */
  memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
+ sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
  sgid_tbl->active++;
  if (vlan_id != 0xFFFF)
  sgid_tbl->vlan[free_idx] = 1;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index 0ec3b12b0bcd..b5c4ce302c61 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -84,6 +84,11 @@ struct bnxt_qplib_gid {
  u8 data[16];
 };

+struct bnxt_qplib_gid_info {
+ struct bnxt_qplib_gid gid;
+ u16 vlan_id;
+};
+
 struct bnxt_qplib_ah {
  struct bnxt_qplib_gid dgid;
  struct bnxt_qplib_pd *pd;
@@ -221,7 +226,7 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
  struct bnxt_qplib_sgid_tbl *sgid_tbl, int index,
  struct bnxt_qplib_gid *gid);
 int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
- struct bnxt_qplib_gid *gid, bool update);
+ struct bnxt_qplib_gid *gid, u16 vlan_id, bool update);
 int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
  struct bnxt_qplib_gid *gid, u8 *mac, u16 vlan_id,
  bool update, u32 *index);
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 15:40                       ` Jason Gunthorpe
@ 2019-07-12 16:29                         ` Selvin Xavier
  2019-07-12 17:42                           ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-07-12 16:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Yi Zhang, Daniel Jurgens, linux-rdma,
	Devesh Sharma, linux-nvme

On Fri, Jul 12, 2019 at 9:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Jul 12, 2019 at 12:52:24PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Yi Zhang <yi.zhang@redhat.com>
> > > Sent: Friday, July 12, 2019 5:11 PM
> > > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > > <selvin.xavier@broadcom.com>
> > > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > > nvme@lists.infradead.org
> > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > >
> > > Hi Parav
> > > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> > >
> > >
> > > [1]
> > > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > > testnqn
> > > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > > [2]
> > > https://pastebin.com/ipvak0Sp
> > >
> >
> > I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> > This is probably solves it. Not sure. I am not much familiar with it.
> >
> > Selvin,
> > Can you please take a look?
> >
> > From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
> > From: Parav Pandit <parav@mellanox.com>
> > Date: Fri, 12 Jul 2019 04:34:52 -0500
> > Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison
> >
> > GID entry consist of GID, vlan, netdev and smac.
> > Extend GID duplicate check companions to consider vlan_id as well
> > to support IPv6 VLAN based link local addresses.
> >
> > GID deletion based on only GID comparision is not correct.
> > It needs further fixes.
> >
> > Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> > Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> >  drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
> >  drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
> >  drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > index 37928b1111df..216648b640ce 100644
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > @@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
> >                                    struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >                                    u16 max)
> >  {
> > +     u16 i;
> > +
> >       sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
> >       if (!sgid_tbl->tbl)
> >               return -ENOMEM;
> > @@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
> >       if (!sgid_tbl->ctx)
> >               goto out_free2;
> >
> > +     for (i = 0; i < max; i++)
> > +             sgid_tbl->tbl[i].vlan_id = 0xffff;
> > +
> >       sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
> >       if (!sgid_tbl->vlan)
> >               goto out_free3;
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > index 48793d3512ac..0d90be88685f 100644
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > @@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >               return -ENOMEM;
> >       }
> >       for (index = 0; index < sgid_tbl->max; index++) {
> > +             /* FIXME: GID delete should happen based on index
> > +              * and refcount
> > +              */
> >               if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
> >                       break;
> >       }
> > @@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >       }
> >       free_idx = sgid_tbl->max;
> >       for (i = 0; i < sgid_tbl->max; i++) {
> > -             if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> > +             if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> > +                 sgid_tbl->tbl[i].vlan_id == vlan_id) {
> >                       dev_dbg(&res->pdev->dev,
> >                               "SGID entry already exist in entry %d!\n", i);
>
> bnxt guys: please just delete this duplicate detection code from the
> driver. Every GID provided from the core must be programmed into the
> given gid table index.

Jason,
 This check is required in bnxt_re because the HW need only one entry
in its table for RoCE V1 and RoCE v2 Gids.
Sending the second add_gid for RoCE V2 (with the same gid as RoCE v1)
to the HW returns failure. So
driver handles this using a ref count. During delete_gid, the entry in
the HW is deleted only if the refcount is zero.

Thanks,
Selvin

>
> Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 16:29                         ` Selvin Xavier
@ 2019-07-12 17:42                           ` Jason Gunthorpe
  2019-07-13  7:51                             ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2019-07-12 17:42 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Parav Pandit, Yi Zhang, Daniel Jurgens, linux-rdma,
	Devesh Sharma, linux-nvme

On Fri, Jul 12, 2019 at 09:59:38PM +0530, Selvin Xavier wrote:

> > bnxt guys: please just delete this duplicate detection code from the
> > driver. Every GID provided from the core must be programmed into the
> > given gid table index.
> 
> Jason,
>  This check is required in bnxt_re because the HW need only one entry
> in its table for RoCE V1 and RoCE v2 Gids.

The HW doesn't have a 'GID table' if it has this restriction. It
sounds like it has some 'IP table' or maybe 'IP and VLAN' table?

So the driver must provide a full emulated 'GID Table' with all the
normal semantics.

Which looks sort of like what the add side is doing, other than the
mis-naming it seems reasonable..

But then I see this in re_create_ah:

	/*
	 * If RoCE V2 is enabled, stack will have two entries for
	 * each GID entry. Avoiding this duplicte entry in HW. Dividing
	 * the GID index by 2 for RoCE V2
	 */
	ah->qplib_ah.sgid_index = grh->sgid_index / 2;

Which seems completely wrong - that is making assumptions about the
layout of the gid table that is just not allowed.

Surely it needs to access the per-gid driver context that add_gid set
and use the index into the sgid_table from there?

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 17:42                           ` Jason Gunthorpe
@ 2019-07-13  7:51                             ` Selvin Xavier
  2019-07-13 12:12                               ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Selvin Xavier @ 2019-07-13  7:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Yi Zhang, Daniel Jurgens, linux-rdma,
	Devesh Sharma, linux-nvme

On Fri, Jul 12, 2019 at 11:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Jul 12, 2019 at 09:59:38PM +0530, Selvin Xavier wrote:
>
> > > bnxt guys: please just delete this duplicate detection code from the
> > > driver. Every GID provided from the core must be programmed into the
> > > given gid table index.
> >
> > Jason,
> >  This check is required in bnxt_re because the HW need only one entry
> > in its table for RoCE V1 and RoCE v2 Gids.
>
> The HW doesn't have a 'GID table' if it has this restriction. It
> sounds like it has some 'IP table' or maybe 'IP and VLAN' table?
>
> So the driver must provide a full emulated 'GID Table' with all the
> normal semantics.
>
> Which looks sort of like what the add side is doing, other than the
> mis-naming it seems reasonable..
>
> But then I see this in re_create_ah:
>
>         /*
>          * If RoCE V2 is enabled, stack will have two entries for
>          * each GID entry. Avoiding this duplicte entry in HW. Dividing
>          * the GID index by 2 for RoCE V2
>          */
>         ah->qplib_ah.sgid_index = grh->sgid_index / 2;
>
> Which seems completely wrong - that is making assumptions about the
> layout of the gid table that is just not allowed.
>
> Surely it needs to access the per-gid driver context that add_gid set
> and use the index into the sgid_table from there?
>
Agree.. We need a mapping between HW table index and GID table index.
We can either maintain a mapping in the driver or have an ib stack function to
get the per gid driver context from gid table index. The later makes sense
only if other drivers also needs such interface.  I am not sure any
other driver needs
it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-12 16:18                       ` Selvin Xavier
@ 2019-07-13  7:56                         ` Yi Zhang
  2019-07-13 16:00                           ` Selvin Xavier
  0 siblings, 1 reply; 18+ messages in thread
From: Yi Zhang @ 2019-07-13  7:56 UTC (permalink / raw)
  To: Selvin Xavier, Parav Pandit
  Cc: Daniel Jurgens, linux-rdma, Devesh Sharma, linux-nvme

Hi Salvin

Confirmed the patch fixed the login issue.

And from the dmesg I found there is only one I/O queue created, is that 
reasonable? there are more queues created during my testing for IB/iWARP.

# nvme connect-all -t rdma -a 172.31.40.125 -s 4420

# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda       8:0    0 931.5G  0 disk
├─sda1    8:1    0     1G  0 part /boot
├─sda2    8:2    0    25G  0 part /mnt/rdma-ext4
├─sda3    8:3    0    25G  0 part /mnt/rdma-xfs
├─sda4    8:4    0     1K  0 part
├─sda5    8:5    0  15.7G  0 part [SWAP]
└─sda6    8:6    0 864.9G  0 part /
nvme0n1 259:405  0   250G  0 disk

# dmesg
[ 4311.635430] nvme nvme0: new ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery", addr 172.31.40.125:4420
[ 4311.646687] nvme nvme0: Removing ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery"
[ 4311.706052] nvme nvme0: creating 1 I/O queues.
[ 4311.717848] nvme nvme0: mapped 1/0/0 default/read/poll queues.
[ 4311.727384] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.40.125:4420

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
Stepping:            2
CPU MHz:             3283.306
CPU max MHz:         3500.0000
CPU min MHz:         1200.0000
BogoMIPS:            5993.72
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            10240K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr 
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp 
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm 
ida arat pln pts flush_l1d

On 7/13/19 12:18 AM, Selvin Xavier wrote:
> On Fri, Jul 12, 2019 at 6:22 PM Parav Pandit <parav@mellanox.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Yi Zhang <yi.zhang@redhat.com>
>>> Sent: Friday, July 12, 2019 5:11 PM
>>> To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
>>> <selvin.xavier@broadcom.com>
>>> Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
>>> Devesh Sharma <devesh.sharma@broadcom.com>; linux-
>>> nvme@lists.infradead.org
>>> Subject: Re: regression: nvme rdma with bnxt_re0 broken
>>>
>>> Hi Parav
>>> The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
>>>
>>>
>>> [1]
>>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
>>> testnqn
>>> Failed to write to /dev/nvme-fabrics: Connection reset by peer
>>> [2]
>>> https://pastebin.com/ipvak0Sp
>>>
>> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
>> This is probably solves it. Not sure. I am not much familiar with it.
>>
>> Selvin,
>> Can you please take a look?
>>
> Parav,
>   Thanks for the patch. I removed the change you made in struct bnxt_qplib_gid
> and added a new structure struct bnxt_qplib_gid_info to include both
> gid and vlan_id.
> struct bnxt_qplib_gid is used in multiple places to memcpy or compare
> the 16 bytes.
> Also, added vlan checking in the destroy path also. Some more cleanup
> possible in
> delete_gid path. That can be done once the basic issue is fixed.
>
> Yi, Can you please test this patch and see if it is solving the issue?
>
> Thanks,
> Selvin
>
>  From 3d83613cfc5993bf9dae529ab0d4d25ddefff29b Mon Sep 17 00:00:00 2001
> From: Parav Pandit <parav@mellanox.com>
> Date: Fri, 12 Jul 2019 10:32:51 -0400
> Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparison
>
> GID entry consist of GID, vlan, netdev and smac.
> Extend GID duplicate check companions to consider vlan_id as well
> to support IPv6 VLAN based link local addresses.
>
> Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> ---
>   drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  7 +++++--
>   drivers/infiniband/hw/bnxt_re/qplib_res.c | 17 +++++++++++++----
>   drivers/infiniband/hw/bnxt_re/qplib_res.h |  2 +-
>   drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 14 +++++++++-----
>   drivers/infiniband/hw/bnxt_re/qplib_sp.h  |  7 ++++++-
>   5 files changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> index a91653aabf38..098ab883733e 100644
> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> @@ -308,6 +308,7 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> *attr, void **context)
>    struct bnxt_re_dev *rdev = to_bnxt_re_dev(attr->device, ibdev);
>    struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
>    struct bnxt_qplib_gid *gid_to_del;
> + u16 vlan_id = 0xFFFF;
>
>    /* Delete the entry from the hardware */
>    ctx = *context;
> @@ -317,7 +318,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> *attr, void **context)
>    if (sgid_tbl && sgid_tbl->active) {
>    if (ctx->idx >= sgid_tbl->max)
>    return -EINVAL;
> - gid_to_del = &sgid_tbl->tbl[ctx->idx];
> + gid_to_del = &sgid_tbl->tbl[ctx->idx].gid;
> + vlan_id = sgid_tbl->tbl[ctx->idx].vlan_id;
>    /* DEL_GID is called in WQ context(netdevice_event_work_handler)
>    * or via the ib_unregister_device path. In the former case QP1
>    * may not be destroyed yet, in which case just return as FW
> @@ -335,7 +337,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> *attr, void **context)
>    }
>    ctx->refcnt--;
>    if (!ctx->refcnt) {
> - rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del, true);
> + rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del,
> + vlan_id,  true);
>    if (rc) {
>    dev_err(rdev_to_dev(rdev),
>    "Failed to remove GID: %#x", rc);
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c
> b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> index 37928b1111df..7f2571f7a13f 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> @@ -488,7 +488,10 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
> bnxt_qplib_res *res,
>         struct bnxt_qplib_sgid_tbl *sgid_tbl,
>         u16 max)
>   {
> - sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
> + u16 i;
> +
> + sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid_info),
> + GFP_KERNEL);
>    if (!sgid_tbl->tbl)
>    return -ENOMEM;
>
> @@ -500,6 +503,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
> bnxt_qplib_res *res,
>    if (!sgid_tbl->ctx)
>    goto out_free2;
>
> + for (i = 0; i < max; i++)
> + sgid_tbl->tbl[i].vlan_id = 0xffff;
> +
>    sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
>    if (!sgid_tbl->vlan)
>    goto out_free3;
> @@ -526,9 +532,11 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
> bnxt_qplib_res *res,
>    for (i = 0; i < sgid_tbl->max; i++) {
>    if (memcmp(&sgid_tbl->tbl[i], &bnxt_qplib_gid_zero,
>       sizeof(bnxt_qplib_gid_zero)))
> - bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i], true);
> + bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i].gid,
> +     sgid_tbl->tbl[i].vlan_id, true);
>    }
> - memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
> + memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
> + sgid_tbl->max);
>    memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
>    memset(sgid_tbl->vlan, 0, sizeof(u8) * sgid_tbl->max);
>    sgid_tbl->active = 0;
> @@ -537,7 +545,8 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
> bnxt_qplib_res *res,
>   static void bnxt_qplib_init_sgid_tbl(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>         struct net_device *netdev)
>   {
> - memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
> + memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
> + sgid_tbl->max);
>    memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
>   }
>
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h
> b/drivers/infiniband/hw/bnxt_re/qplib_res.h
> index 30c42c92fac7..fbda11a7ab1a 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_res.h
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h
> @@ -111,7 +111,7 @@ struct bnxt_qplib_pd_tbl {
>   };
>
>   struct bnxt_qplib_sgid_tbl {
> - struct bnxt_qplib_gid *tbl;
> + struct bnxt_qplib_gid_info *tbl;
>    u16 *hw_id;
>    u16 max;
>    u16 active;
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> index 48793d3512ac..40296b97d21e 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> @@ -213,12 +213,12 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
>    index, sgid_tbl->max);
>    return -EINVAL;
>    }
> - memcpy(gid, &sgid_tbl->tbl[index], sizeof(*gid));
> + memcpy(gid, &sgid_tbl->tbl[index].gid, sizeof(*gid));
>    return 0;
>   }
>
>   int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> - struct bnxt_qplib_gid *gid, bool update)
> + struct bnxt_qplib_gid *gid, u16 vlan_id, bool update)
>   {
>    struct bnxt_qplib_res *res = to_bnxt_qplib(sgid_tbl,
>       struct bnxt_qplib_res,
> @@ -236,7 +236,8 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
> *sgid_tbl,
>    return -ENOMEM;
>    }
>    for (index = 0; index < sgid_tbl->max; index++) {
> - if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
> + if (!memcmp(&sgid_tbl->tbl[index].gid, gid, sizeof(*gid)) &&
> +     vlan_id == sgid_tbl->tbl[index].vlan_id)
>    break;
>    }
>    if (index == sgid_tbl->max) {
> @@ -262,8 +263,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
> *sgid_tbl,
>    if (rc)
>    return rc;
>    }
> - memcpy(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
> + memcpy(&sgid_tbl->tbl[index].gid, &bnxt_qplib_gid_zero,
>           sizeof(bnxt_qplib_gid_zero));
> + sgid_tbl->tbl[index].vlan_id = 0xFFFF;
>    sgid_tbl->vlan[index] = 0;
>    sgid_tbl->active--;
>    dev_dbg(&res->pdev->dev,
> @@ -296,7 +298,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
> *sgid_tbl,
>    }
>    free_idx = sgid_tbl->max;
>    for (i = 0; i < sgid_tbl->max; i++) {
> - if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> + if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> +     sgid_tbl->tbl[i].vlan_id == vlan_id) {
>    dev_dbg(&res->pdev->dev,
>    "SGID entry already exist in entry %d!\n", i);
>    *index = i;
> @@ -351,6 +354,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
> *sgid_tbl,
>    }
>    /* Add GID to the sgid_tbl */
>    memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
> + sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
>    sgid_tbl->active++;
>    if (vlan_id != 0xFFFF)
>    sgid_tbl->vlan[free_idx] = 1;
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> index 0ec3b12b0bcd..b5c4ce302c61 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> @@ -84,6 +84,11 @@ struct bnxt_qplib_gid {
>    u8 data[16];
>   };
>
> +struct bnxt_qplib_gid_info {
> + struct bnxt_qplib_gid gid;
> + u16 vlan_id;
> +};
> +
>   struct bnxt_qplib_ah {
>    struct bnxt_qplib_gid dgid;
>    struct bnxt_qplib_pd *pd;
> @@ -221,7 +226,7 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
>    struct bnxt_qplib_sgid_tbl *sgid_tbl, int index,
>    struct bnxt_qplib_gid *gid);
>   int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> - struct bnxt_qplib_gid *gid, bool update);
> + struct bnxt_qplib_gid *gid, u16 vlan_id, bool update);
>   int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>    struct bnxt_qplib_gid *gid, u8 *mac, u16 vlan_id,
>    bool update, u32 *index);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-13  7:51                             ` Selvin Xavier
@ 2019-07-13 12:12                               ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2019-07-13 12:12 UTC (permalink / raw)
  To: Selvin Xavier
  Cc: Parav Pandit, Yi Zhang, Daniel Jurgens, linux-rdma,
	Devesh Sharma, linux-nvme

On Sat, Jul 13, 2019 at 01:21:54PM +0530, Selvin Xavier wrote:
> On Fri, Jul 12, 2019 at 11:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Jul 12, 2019 at 09:59:38PM +0530, Selvin Xavier wrote:
> >
> > > > bnxt guys: please just delete this duplicate detection code from the
> > > > driver. Every GID provided from the core must be programmed into the
> > > > given gid table index.
> > >
> > > Jason,
> > >  This check is required in bnxt_re because the HW need only one entry
> > > in its table for RoCE V1 and RoCE v2 Gids.
> >
> > The HW doesn't have a 'GID table' if it has this restriction. It
> > sounds like it has some 'IP table' or maybe 'IP and VLAN' table?
> >
> > So the driver must provide a full emulated 'GID Table' with all the
> > normal semantics.
> >
> > Which looks sort of like what the add side is doing, other than the
> > mis-naming it seems reasonable..
> >
> > But then I see this in re_create_ah:
> >
> >         /*
> >          * If RoCE V2 is enabled, stack will have two entries for
> >          * each GID entry. Avoiding this duplicte entry in HW. Dividing
> >          * the GID index by 2 for RoCE V2
> >          */
> >         ah->qplib_ah.sgid_index = grh->sgid_index / 2;
> >
> > Which seems completely wrong - that is making assumptions about the
> > layout of the gid table that is just not allowed.
> >
> > Surely it needs to access the per-gid driver context that add_gid set
> > and use the index into the sgid_table from there?
> >
> Agree.. We need a mapping between HW table index and GID table index.
> We can either maintain a mapping in the driver or have an ib stack function to
> get the per gid driver context from gid table index. The later makes sense
> only if other drivers also needs such interface.  I am not sure any
> other driver needs
> it.

I'd prefer a core function to return that context..

Even better would be to have the core allocate a larger entry and use
container_of

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: regression: nvme rdma with bnxt_re0 broken
  2019-07-13  7:56                         ` Yi Zhang
@ 2019-07-13 16:00                           ` Selvin Xavier
  0 siblings, 0 replies; 18+ messages in thread
From: Selvin Xavier @ 2019-07-13 16:00 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Parav Pandit, Daniel Jurgens, linux-rdma, Devesh Sharma, linux-nvme

On Sat, Jul 13, 2019 at 1:26 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi Salvin
>
> Confirmed the patch fixed the login issue.
>
Thanks Yi for the confirmation. I will post this patch.

> And from the dmesg I found there is only one I/O queue created, is that
> reasonable? there are more queues created during my testing for IB/iWARP.
num_comp_vectors exported by bnxt_re is 1 and this is expected due to that.
This can be increased. I will post a patch after doing some testing
for the same.

>
> # nvme connect-all -t rdma -a 172.31.40.125 -s 4420
>
> # lsblk
> NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda       8:0    0 931.5G  0 disk
> ├─sda1    8:1    0     1G  0 part /boot
> ├─sda2    8:2    0    25G  0 part /mnt/rdma-ext4
> ├─sda3    8:3    0    25G  0 part /mnt/rdma-xfs
> ├─sda4    8:4    0     1K  0 part
> ├─sda5    8:5    0  15.7G  0 part [SWAP]
> └─sda6    8:6    0 864.9G  0 part /
> nvme0n1 259:405  0   250G  0 disk
>
> # dmesg
> [ 4311.635430] nvme nvme0: new ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.40.125:4420
> [ 4311.646687] nvme nvme0: Removing ctrl: NQN
> "nqn.2014-08.org.nvmexpress.discovery"
> [ 4311.706052] nvme nvme0: creating 1 I/O queues.
> [ 4311.717848] nvme nvme0: mapped 1/0/0 default/read/poll queues.
> [ 4311.727384] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.40.125:4420
>
> # lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               63
> Model name:          Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
> Stepping:            2
> CPU MHz:             3283.306
> CPU max MHz:         3500.0000
> CPU min MHz:         1200.0000
> BogoMIPS:            5993.72
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
> NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
> nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp
> tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1
> avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm
> ida arat pln pts flush_l1d
>
> On 7/13/19 12:18 AM, Selvin Xavier wrote:
> > On Fri, Jul 12, 2019 at 6:22 PM Parav Pandit <parav@mellanox.com> wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Yi Zhang <yi.zhang@redhat.com>
> >>> Sent: Friday, July 12, 2019 5:11 PM
> >>> To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> >>> <selvin.xavier@broadcom.com>
> >>> Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> >>> Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> >>> nvme@lists.infradead.org
> >>> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> >>>
> >>> Hi Parav
> >>> The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> >>>
> >>>
> >>> [1]
> >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> >>> testnqn
> >>> Failed to write to /dev/nvme-fabrics: Connection reset by peer
> >>> [2]
> >>> https://pastebin.com/ipvak0Sp
> >>>
> >> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> >> This is probably solves it. Not sure. I am not much familiar with it.
> >>
> >> Selvin,
> >> Can you please take a look?
> >>
> > Parav,
> >   Thanks for the patch. I removed the change you made in struct bnxt_qplib_gid
> > and added a new structure struct bnxt_qplib_gid_info to include both
> > gid and vlan_id.
> > struct bnxt_qplib_gid is used in multiple places to memcpy or compare
> > the 16 bytes.
> > Also, added vlan checking in the destroy path also. Some more cleanup
> > possible in
> > delete_gid path. That can be done once the basic issue is fixed.
> >
> > Yi, Can you please test this patch and see if it is solving the issue?
> >
> > Thanks,
> > Selvin
> >
> >  From 3d83613cfc5993bf9dae529ab0d4d25ddefff29b Mon Sep 17 00:00:00 2001
> > From: Parav Pandit <parav@mellanox.com>
> > Date: Fri, 12 Jul 2019 10:32:51 -0400
> > Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparison
> >
> > GID entry consist of GID, vlan, netdev and smac.
> > Extend GID duplicate check companions to consider vlan_id as well
> > to support IPv6 VLAN based link local addresses.
> >
> > Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> > ---
> >   drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  7 +++++--
> >   drivers/infiniband/hw/bnxt_re/qplib_res.c | 17 +++++++++++++----
> >   drivers/infiniband/hw/bnxt_re/qplib_res.h |  2 +-
> >   drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 14 +++++++++-----
> >   drivers/infiniband/hw/bnxt_re/qplib_sp.h  |  7 ++++++-
> >   5 files changed, 34 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > index a91653aabf38..098ab883733e 100644
> > --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> > @@ -308,6 +308,7 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> > *attr, void **context)
> >    struct bnxt_re_dev *rdev = to_bnxt_re_dev(attr->device, ibdev);
> >    struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
> >    struct bnxt_qplib_gid *gid_to_del;
> > + u16 vlan_id = 0xFFFF;
> >
> >    /* Delete the entry from the hardware */
> >    ctx = *context;
> > @@ -317,7 +318,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> > *attr, void **context)
> >    if (sgid_tbl && sgid_tbl->active) {
> >    if (ctx->idx >= sgid_tbl->max)
> >    return -EINVAL;
> > - gid_to_del = &sgid_tbl->tbl[ctx->idx];
> > + gid_to_del = &sgid_tbl->tbl[ctx->idx].gid;
> > + vlan_id = sgid_tbl->tbl[ctx->idx].vlan_id;
> >    /* DEL_GID is called in WQ context(netdevice_event_work_handler)
> >    * or via the ib_unregister_device path. In the former case QP1
> >    * may not be destroyed yet, in which case just return as FW
> > @@ -335,7 +337,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
> > *attr, void **context)
> >    }
> >    ctx->refcnt--;
> >    if (!ctx->refcnt) {
> > - rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del, true);
> > + rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del,
> > + vlan_id,  true);
> >    if (rc) {
> >    dev_err(rdev_to_dev(rdev),
> >    "Failed to remove GID: %#x", rc);
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > index 37928b1111df..7f2571f7a13f 100644
> > --- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > @@ -488,7 +488,10 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
> > bnxt_qplib_res *res,
> >         struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >         u16 max)
> >   {
> > - sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
> > + u16 i;
> > +
> > + sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid_info),
> > + GFP_KERNEL);
> >    if (!sgid_tbl->tbl)
> >    return -ENOMEM;
> >
> > @@ -500,6 +503,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
> > bnxt_qplib_res *res,
> >    if (!sgid_tbl->ctx)
> >    goto out_free2;
> >
> > + for (i = 0; i < max; i++)
> > + sgid_tbl->tbl[i].vlan_id = 0xffff;
> > +
> >    sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
> >    if (!sgid_tbl->vlan)
> >    goto out_free3;
> > @@ -526,9 +532,11 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
> > bnxt_qplib_res *res,
> >    for (i = 0; i < sgid_tbl->max; i++) {
> >    if (memcmp(&sgid_tbl->tbl[i], &bnxt_qplib_gid_zero,
> >       sizeof(bnxt_qplib_gid_zero)))
> > - bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i], true);
> > + bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i].gid,
> > +     sgid_tbl->tbl[i].vlan_id, true);
> >    }
> > - memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
> > + memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
> > + sgid_tbl->max);
> >    memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
> >    memset(sgid_tbl->vlan, 0, sizeof(u8) * sgid_tbl->max);
> >    sgid_tbl->active = 0;
> > @@ -537,7 +545,8 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
> > bnxt_qplib_res *res,
> >   static void bnxt_qplib_init_sgid_tbl(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >         struct net_device *netdev)
> >   {
> > - memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
> > + memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
> > + sgid_tbl->max);
> >    memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
> >   }
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h
> > b/drivers/infiniband/hw/bnxt_re/qplib_res.h
> > index 30c42c92fac7..fbda11a7ab1a 100644
> > --- a/drivers/infiniband/hw/bnxt_re/qplib_res.h
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h
> > @@ -111,7 +111,7 @@ struct bnxt_qplib_pd_tbl {
> >   };
> >
> >   struct bnxt_qplib_sgid_tbl {
> > - struct bnxt_qplib_gid *tbl;
> > + struct bnxt_qplib_gid_info *tbl;
> >    u16 *hw_id;
> >    u16 max;
> >    u16 active;
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > index 48793d3512ac..40296b97d21e 100644
> > --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > @@ -213,12 +213,12 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
> >    index, sgid_tbl->max);
> >    return -EINVAL;
> >    }
> > - memcpy(gid, &sgid_tbl->tbl[index], sizeof(*gid));
> > + memcpy(gid, &sgid_tbl->tbl[index].gid, sizeof(*gid));
> >    return 0;
> >   }
> >
> >   int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> > - struct bnxt_qplib_gid *gid, bool update)
> > + struct bnxt_qplib_gid *gid, u16 vlan_id, bool update)
> >   {
> >    struct bnxt_qplib_res *res = to_bnxt_qplib(sgid_tbl,
> >       struct bnxt_qplib_res,
> > @@ -236,7 +236,8 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
> > *sgid_tbl,
> >    return -ENOMEM;
> >    }
> >    for (index = 0; index < sgid_tbl->max; index++) {
> > - if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
> > + if (!memcmp(&sgid_tbl->tbl[index].gid, gid, sizeof(*gid)) &&
> > +     vlan_id == sgid_tbl->tbl[index].vlan_id)
> >    break;
> >    }
> >    if (index == sgid_tbl->max) {
> > @@ -262,8 +263,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
> > *sgid_tbl,
> >    if (rc)
> >    return rc;
> >    }
> > - memcpy(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
> > + memcpy(&sgid_tbl->tbl[index].gid, &bnxt_qplib_gid_zero,
> >           sizeof(bnxt_qplib_gid_zero));
> > + sgid_tbl->tbl[index].vlan_id = 0xFFFF;
> >    sgid_tbl->vlan[index] = 0;
> >    sgid_tbl->active--;
> >    dev_dbg(&res->pdev->dev,
> > @@ -296,7 +298,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
> > *sgid_tbl,
> >    }
> >    free_idx = sgid_tbl->max;
> >    for (i = 0; i < sgid_tbl->max; i++) {
> > - if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> > + if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> > +     sgid_tbl->tbl[i].vlan_id == vlan_id) {
> >    dev_dbg(&res->pdev->dev,
> >    "SGID entry already exist in entry %d!\n", i);
> >    *index = i;
> > @@ -351,6 +354,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
> > *sgid_tbl,
> >    }
> >    /* Add GID to the sgid_tbl */
> >    memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
> > + sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
> >    sgid_tbl->active++;
> >    if (vlan_id != 0xFFFF)
> >    sgid_tbl->vlan[free_idx] = 1;
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> > b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> > index 0ec3b12b0bcd..b5c4ce302c61 100644
> > --- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
> > @@ -84,6 +84,11 @@ struct bnxt_qplib_gid {
> >    u8 data[16];
> >   };
> >
> > +struct bnxt_qplib_gid_info {
> > + struct bnxt_qplib_gid gid;
> > + u16 vlan_id;
> > +};
> > +
> >   struct bnxt_qplib_ah {
> >    struct bnxt_qplib_gid dgid;
> >    struct bnxt_qplib_pd *pd;
> > @@ -221,7 +226,7 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
> >    struct bnxt_qplib_sgid_tbl *sgid_tbl, int index,
> >    struct bnxt_qplib_gid *gid);
> >   int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> > - struct bnxt_qplib_gid *gid, bool update);
> > + struct bnxt_qplib_gid *gid, u16 vlan_id, bool update);
> >   int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >    struct bnxt_qplib_gid *gid, u8 *mac, u16 vlan_id,
> >    bool update, u32 *index);

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-07-13 16:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1310083272.27124086.1562836112586.JavaMail.zimbra@redhat.com>
     [not found] ` <619411460.27128070.1562838433020.JavaMail.zimbra@redhat.com>
2019-07-11 16:13   ` regression: nvme rdma with bnxt_re0 broken Sagi Grimberg
     [not found]   ` <AM0PR05MB48664657022ECA8526E3C967D1F30@AM0PR05MB4866.eurprd05.prod.outlook.com>
2019-07-11 16:18     ` Parav Pandit
2019-07-12  1:53       ` Yi Zhang
2019-07-12  2:49         ` Parav Pandit
2019-07-12  3:45           ` Selvin Xavier
2019-07-12  9:28             ` Parav Pandit
2019-07-12  9:39               ` Parav Pandit
2019-07-12  9:49                 ` Parav Pandit
2019-07-12 11:41                   ` Yi Zhang
2019-07-12 12:52                     ` Parav Pandit
2019-07-12 15:40                       ` Jason Gunthorpe
2019-07-12 16:29                         ` Selvin Xavier
2019-07-12 17:42                           ` Jason Gunthorpe
2019-07-13  7:51                             ` Selvin Xavier
2019-07-13 12:12                               ` Jason Gunthorpe
2019-07-12 16:18                       ` Selvin Xavier
2019-07-13  7:56                         ` Yi Zhang
2019-07-13 16:00                           ` Selvin Xavier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).