linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 【Ask for help】 A question for __ib_cache_gid_add()
@ 2019-11-01  9:36 oulijun
  2019-11-01 13:05 ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: oulijun @ 2019-11-01  9:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford; +Cc: linux-rdma

Hi
  I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC port,. When I modify the perr mac1 to mac2,then restore to mac1, it will cause
the gid0 and gid 1 of the roce to be unavailable, and check that the /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
the protocol stack print will appear.

  Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
Oct 16 17:59:37 ubuntu kernel: [200636.705854] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
Oct 16 17:59:39 ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6: ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready
Oct 16 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0: link down
Oct 16 18:00:56 ubuntu kernel: [200716.016142] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
Oct 16 18:00:58 ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
Oct 16 18:00:58 ubuntu kernel: [200717.229863] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28

Has anyone else encounterd a similar problem ? I wonder if the _ib_cache_add_gid() is defective in 5.0 kernel?

Thanks
Lijun


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01  9:36 【Ask for help】 A question for __ib_cache_gid_add() oulijun
@ 2019-11-01 13:05 ` Jason Gunthorpe
  2019-11-01 13:21   ` Mark Zhang
  2019-11-01 16:00   ` Parav Pandit
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2019-11-01 13:05 UTC (permalink / raw)
  To: oulijun, Parav Pandit; +Cc: Doug Ledford, linux-rdma

On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
> Hi
>   I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC port,. When I modify the perr mac1 to mac2,then restore to mac1, it will cause
> the gid0 and gid 1 of the roce to be unavailable, and check that the /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
> the protocol stack print will appear.
> 
>   Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
> Oct 16 17:59:37 ubuntu kernel: [200636.705854] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
> Oct 16 17:59:39 ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6: ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready
> Oct 16 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0: link down
> Oct 16 18:00:56 ubuntu kernel: [200716.016142] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
> Oct 16 18:00:58 ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
> Oct 16 18:00:58 ubuntu kernel: [200717.229863] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
> 
> Has anyone else encounterd a similar problem ? I wonder if the _ib_cache_add_gid() is defective in 5.0 kernel?

Maybe Parav knows?

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01 13:05 ` Jason Gunthorpe
@ 2019-11-01 13:21   ` Mark Zhang
  2019-11-01 13:36     ` Mark Zhang
  2019-11-05  8:47     ` oulijun
  2019-11-01 16:00   ` Parav Pandit
  1 sibling, 2 replies; 8+ messages in thread
From: Mark Zhang @ 2019-11-01 13:21 UTC (permalink / raw)
  To: Jason Gunthorpe, oulijun, Parav Pandit; +Cc: Doug Ledford, linux-rdma

On 11/1/2019 9:05 PM, Jason Gunthorpe wrote:
> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
>> Hi
>>    I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC port,. When I modify the perr mac1 to mac2,then restore to mac1, it will cause
>> the gid0 and gid 1 of the roce to be unavailable, and check that the /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
>> the protocol stack print will appear.
>>
>>    Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>> Oct 16 17:59:37 ubuntu kernel: [200636.705854] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>> Oct 16 17:59:39 ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
>> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6: ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready
>> Oct 16 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0: link down
>> Oct 16 18:00:56 ubuntu kernel: [200716.016142] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>> Oct 16 18:00:58 ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>> Oct 16 18:00:58 ubuntu kernel: [200717.229863] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>
>> Has anyone else encounterd a similar problem ? I wonder if the _ib_cache_add_gid() is defective in 5.0 kernel?
> 
> Maybe Parav knows?
> 
One possibility is, during the operation you have port reset; Then
all gids will be deleted and re-created. But during gids re-build
gid entry #0 and #1 is in use, so they cannot be deleted then you
get this error.

> Jason
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01 13:21   ` Mark Zhang
@ 2019-11-01 13:36     ` Mark Zhang
  2019-11-05  8:47     ` oulijun
  1 sibling, 0 replies; 8+ messages in thread
From: Mark Zhang @ 2019-11-01 13:36 UTC (permalink / raw)
  To: Jason Gunthorpe, oulijun, Parav Pandit; +Cc: Doug Ledford, linux-rdma

On 11/1/2019 9:21 PM, Mark Zhang wrote:
> On 11/1/2019 9:05 PM, Jason Gunthorpe wrote:
>> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
>>> Hi
>>>     I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC port,. When I modify the perr mac1 to mac2,then restore to mac1, it will cause
>>> the gid0 and gid 1 of the roce to be unavailable, and check that the /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
>>> the protocol stack print will appear.
>>>
>>>     Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>>> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>>> Oct 16 17:59:37 ubuntu kernel: [200636.705854] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>>> Oct 16 17:59:39 ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
>>> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6: ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready
>>> Oct 16 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0: link down
>>> Oct 16 18:00:56 ubuntu kernel: [200716.016142] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>> Oct 16 18:00:58 ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>>> Oct 16 18:00:58 ubuntu kernel: [200717.229863] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>>
>>> Has anyone else encounterd a similar problem ? I wonder if the _ib_cache_add_gid() is defective in 5.0 kernel?
>>
>> Maybe Parav knows?
>>
> One possibility is, during the operation you have port reset; Then
> all gids will be deleted and re-created. But during gids re-build
> gid entry #0 and #1 is in use, so they cannot be deleted then you
> get this error.
> 
I mean then you get this error message, which is expected. Check
all valid gid entries before and after the operation.

>> Jason
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01 13:05 ` Jason Gunthorpe
  2019-11-01 13:21   ` Mark Zhang
@ 2019-11-01 16:00   ` Parav Pandit
  2019-11-13 12:35     ` oulijun
  1 sibling, 1 reply; 8+ messages in thread
From: Parav Pandit @ 2019-11-01 16:00 UTC (permalink / raw)
  To: Jason Gunthorpe, oulijun; +Cc: Doug Ledford, linux-rdma

Hi Lijun,

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, November 1, 2019 8:06 AM
> To: oulijun <oulijun@huawei.com>; Parav Pandit <parav@mellanox.com>
> Cc: Doug Ledford <dledford@redhat.com>; linux-rdma <linux-
> rdma@vger.kernel.org>
> Subject: Re: 【Ask for help】 A question for __ib_cache_gid_add()
> 
> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
> > Hi
> >   I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC
> > port,. When I modify the perr mac1 to mac2,then restore to mac1, it will
> cause the gid0 and gid 1 of the roce to be unavailable, and check that the
> /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
> > the protocol stack print will appear.
> >
> >   Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add:
> > unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
> > Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to
> > HW filter on device enp189s0f0 Oct 16 17:59:37 ubuntu kernel:
> > [200636.705854] __ib_cache_gid_add: unable to add gid
> > fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28 Oct 16 17:59:39
> > ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
> > Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6:
> > ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready Oct 16
> > 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0:
> > link down Oct 16 18:00:56 ubuntu kernel: [200716.016142]
> > __ib_cache_gid_add: unable to add gid
> > fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28 Oct 16 18:00:58
> > ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on
> > device enp189s0f0 Oct 16 18:00:58 ubuntu kernel: [200717.229863]
> > __ib_cache_gid_add: unable to add gid
> > fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
> >
> > Has anyone else encounterd a similar problem ? I wonder if the
> _ib_cache_add_gid() is defective in 5.0 kernel?
> 
> Maybe Parav knows?

I used the kernel from [1], which seems to be fine; it has the required commits [2], [3], [4].

Are you running RDMA traffic/applications which are using GID 0 and 1 when changing MAC?
If so, administrative operation such as MAC address change during active RDMA traffic is unsupported, which can lead to this error.
Can you please confirm?

If you are not running RDMA traffic while changing the mac, I need more debug logs.
Can you please enable ftrace and share the output file mac_change_trace.txt using below steps?

echo 0 > /sys/kernel/debug/tracing/tracing_on
echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo > /sys/kernel/debug/tracing/trace
echo > /sys/kernel/debug/tracing/set_ftrace_filter
echo ':mod:ib*' > /sys/kernel/debug/tracing/set_ftrace_filter
echo ':mod:rdma*' >> /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/tracing_on

ip link set <netdev> address <new_mac1>
ip link set <netdev> address <new_mac2>
cat /sys/kernel/debug/tracing/trace > mac_change_trace.txt

[1] git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack v5.0
[2] commit 5c5702e259dc ("RDMA/core: Set right entry state before releasing reference")
[3] commit be5914c124bc ("RDMA/core: Delete RoCE GID in hw when corresponding IP is deleted")
[4] commit d12e2eed2743 ("IB/core: Update GID entries for netdevice whose mac address changes")


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01 13:21   ` Mark Zhang
  2019-11-01 13:36     ` Mark Zhang
@ 2019-11-05  8:47     ` oulijun
  1 sibling, 0 replies; 8+ messages in thread
From: oulijun @ 2019-11-05  8:47 UTC (permalink / raw)
  To: Mark Zhang, Jason Gunthorpe, Parav Pandit; +Cc: Doug Ledford, linux-rdma

在 2019/11/1 21:21, Mark Zhang 写道:
> On 11/1/2019 9:05 PM, Jason Gunthorpe wrote:
>> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
>>> Hi
>>>    I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC port,. When I modify the perr mac1 to mac2,then restore to mac1, it will cause
>>> the gid0 and gid 1 of the roce to be unavailable, and check that the /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
>>> the protocol stack print will appear.
>>>
>>>    Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>>> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>>> Oct 16 17:59:37 ubuntu kernel: [200636.705854] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>>> Oct 16 17:59:39 ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
>>> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6: ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready
>>> Oct 16 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0: link down
>>> Oct 16 18:00:56 ubuntu kernel: [200716.016142] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>> Oct 16 18:00:58 ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on device enp189s0f0
>>> Oct 16 18:00:58 ubuntu kernel: [200717.229863] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>>
>>> Has anyone else encounterd a similar problem ? I wonder if the _ib_cache_add_gid() is defective in 5.0 kernel?
>> Maybe Parav knows?
>>
> One possibility is, during the operation you have port reset; Then
> all gids will be deleted and re-created. But during gids re-build
> gid entry #0 and #1 is in use, so they cannot be deleted then you
> get this error.
Thank your reply. I will check it acording to my logs.
>> Jason
>>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-01 16:00   ` Parav Pandit
@ 2019-11-13 12:35     ` oulijun
  2019-11-13 18:03       ` Parav Pandit
  0 siblings, 1 reply; 8+ messages in thread
From: oulijun @ 2019-11-13 12:35 UTC (permalink / raw)
  To: Parav Pandit, Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

在 2019/11/2 0:00, Parav Pandit 写道:
> Hi Lijun,
>
>> -----Original Message-----
>> From: Jason Gunthorpe <jgg@ziepe.ca>
>> Sent: Friday, November 1, 2019 8:06 AM
>> To: oulijun <oulijun@huawei.com>; Parav Pandit <parav@mellanox.com>
>> Cc: Doug Ledford <dledford@redhat.com>; linux-rdma <linux-
>> rdma@vger.kernel.org>
>> Subject: Re: 【Ask for help】 A question for __ib_cache_gid_add()
>>
>> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
>>> Hi
>>>   I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC
>>> port,. When I modify the perr mac1 to mac2,then restore to mac1, it will
>> cause the gid0 and gid 1 of the roce to be unavailable, and check that the
>> /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
>>> the protocol stack print will appear.
>>>
>>>   Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add:
>>> unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
>>> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0 to
>>> HW filter on device enp189s0f0 Oct 16 17:59:37 ubuntu kernel:
>>> [200636.705854] __ib_cache_gid_add: unable to add gid
>>> fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28 Oct 16 17:59:39
>>> ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
>>> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6:
>>> ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready Oct 16
>>> 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0:
>>> link down Oct 16 18:00:56 ubuntu kernel: [200716.016142]
>>> __ib_cache_gid_add: unable to add gid
>>> fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28 Oct 16 18:00:58
>>> ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on
>>> device enp189s0f0 Oct 16 18:00:58 ubuntu kernel: [200717.229863]
>>> __ib_cache_gid_add: unable to add gid
>>> fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
>>>
>>> Has anyone else encounterd a similar problem ? I wonder if the
>> _ib_cache_add_gid() is defective in 5.0 kernel?
>>
>> Maybe Parav knows?
> I used the kernel from [1], which seems to be fine; it has the required commits [2], [3], [4].
>
> Are you running RDMA traffic/applications which are using GID 0 and 1 when changing MAC?
> If so, administrative operation such as MAC address change during active RDMA traffic is unsupported, which can lead to this error.
> Can you please confirm?
Hi, parav Pandit
    if running RDMA traffic/application which are using vlan gid when vconfig rem the vlan, it will happen the following error?
   
 Oct 11 13:51:13 ubuntu kernel: [10408.846497] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:51:23 ubuntu kernel: [10418.926477] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:51:33 ubuntu kernel: [10429.006489] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:51:43 ubuntu kernel: [10439.086477] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:51:53 ubuntu kernel: [10449.166493] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:04 ubuntu kernel: [10459.246473] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:14 ubuntu kernel: [10469.326478] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:24 ubuntu kernel: [10479.406470] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:34 ubuntu kernel: [10489.486495] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:44 ubuntu kernel: [10499.566476] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4
Oct 11 13:52:54 ubuntu kernel: [10509.646600] unregister_netdevice: waiting for eno1.1000 to become free. Usage count = 4


Thanks

> If you are not running RDMA traffic while changing the mac, I need more debug logs.
> Can you please enable ftrace and share the output file mac_change_trace.txt using below steps?
>
> echo 0 > /sys/kernel/debug/tracing/tracing_on
> echo function_graph > /sys/kernel/debug/tracing/current_tracer
> echo > /sys/kernel/debug/tracing/trace
> echo > /sys/kernel/debug/tracing/set_ftrace_filter
> echo ':mod:ib*' > /sys/kernel/debug/tracing/set_ftrace_filter
> echo ':mod:rdma*' >> /sys/kernel/debug/tracing/set_ftrace_filter
> echo 1 > /sys/kernel/debug/tracing/tracing_on
>
> ip link set <netdev> address <new_mac1>
> ip link set <netdev> address <new_mac2>
> cat /sys/kernel/debug/tracing/trace > mac_change_trace.txt
>
> [1] git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack v5.0
> [2] commit 5c5702e259dc ("RDMA/core: Set right entry state before releasing reference")
> [3] commit be5914c124bc ("RDMA/core: Delete RoCE GID in hw when corresponding IP is deleted")
> [4] commit d12e2eed2743 ("IB/core: Update GID entries for netdevice whose mac address changes")
>
>
> .
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: 【Ask for help】 A question for __ib_cache_gid_add()
  2019-11-13 12:35     ` oulijun
@ 2019-11-13 18:03       ` Parav Pandit
  0 siblings, 0 replies; 8+ messages in thread
From: Parav Pandit @ 2019-11-13 18:03 UTC (permalink / raw)
  To: oulijun, Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma

Hi Lijun,

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of oulijun
> Sent: Wednesday, November 13, 2019 6:36 AM
> 在 2019/11/2 0:00, Parav Pandit 写道:
> > Hi Lijun,
> >
> >> -----Original Message-----
> >> From: Jason Gunthorpe <jgg@ziepe.ca>
> >> Sent: Friday, November 1, 2019 8:06 AM
> >> To: oulijun <oulijun@huawei.com>; Parav Pandit <parav@mellanox.com>
> >> Cc: Doug Ledford <dledford@redhat.com>; linux-rdma <linux-
> >> rdma@vger.kernel.org>
> >> Subject: Re: 【Ask for help】 A question for __ib_cache_gid_add()
> >>
> >> On Fri, Nov 01, 2019 at 05:36:36PM +0800, oulijun wrote:
> >>> Hi
> >>>   I am using the ubuntu system(5.0.0 kernel) to test the hip08 NIC
> >>> port,. When I modify the perr mac1 to mac2,then restore to mac1, it
> >>> will
> >> cause the gid0 and gid 1 of the roce to be unavailable, and check
> >> that the
> >> /sys/class/infiniband/hns_0/ports/1/gid_attrs/ndevs/0 is show invalid.
> >>> the protocol stack print will appear.
> >>>
> >>>   Oct 16 17:59:36 ubuntu kernel: [200635.496317] __ib_cache_gid_add:
> >>> unable to add gid fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28
> >>> Oct 16 17:59:37 ubuntu kernel: [200636.705848] 8021q: adding VLAN 0
> >>> to HW filter on device enp189s0f0 Oct 16 17:59:37 ubuntu kernel:
> >>> [200636.705854] __ib_cache_gid_add: unable to add gid
> >>> fe80:0000:0000:0000:4600:4dff:fea7:9599 error=-28 Oct 16 17:59:39
> >>> ubuntu kernel: [200638.755828] hns3 0000:bd:00.0 enp189s0f0: link up
> >>> Oct 16 17:59:39 ubuntu kernel: [200638.755847] IPv6:
> >>> ADDRCONF(NETDEV_CHANGE): enp189s0f0: link becomes ready Oct 16
> >>> 18:00:56 ubuntu kernel: [200715.699961] hns3 0000:bd:00.0 enp189s0f0:
> >>> link down Oct 16 18:00:56 ubuntu kernel: [200716.016142]
> >>> __ib_cache_gid_add: unable to add gid
> >>> fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28 Oct 16 18:00:58
> >>> ubuntu kernel: [200717.229857] 8021q: adding VLAN 0 to HW filter on
> >>> device enp189s0f0 Oct 16 18:00:58 ubuntu kernel: [200717.229863]
> >>> __ib_cache_gid_add: unable to add gid
> >>> fe80:0000:0000:0000:4600:4dff:fea7:95f4 error=-28
> >>>
> >>> Has anyone else encounterd a similar problem ? I wonder if the
> >> _ib_cache_add_gid() is defective in 5.0 kernel?
> >>
> >> Maybe Parav knows?
> > I used the kernel from [1], which seems to be fine; it has the required
> commits [2], [3], [4].
> >
> > Are you running RDMA traffic/applications which are using GID 0 and 1 when
> changing MAC?
> > If so, administrative operation such as MAC address change during active
> RDMA traffic is unsupported, which can lead to this error.
> > Can you please confirm?
> Hi, parav Pandit
>     if running RDMA traffic/application which are using vlan gid when vconfig
> rem the vlan, it will happen the following error?

Yes, that is correct. Application is bound the rdma device and its associated vlan netdevice, holding reference to it while QPs are active.
Once application releases the QPs, unregistration should progress.

Supporting dynamic detachment of netdevices from the RoCE device was added using below 7 patches, those are not present in the Ubuntu 5.0.0. kernel that I looked at.

3bf3e2b881c1 RDMA/rxe: Consider skb reserve space based on netdev of GID
8f9748602491 IB/cm: Reduce dependency on gid attribute ndev check
a70c07397fd8 RDMA: Introduce and use GID attr helper to read RoCE L2 fields
adb4a57a7a1d RDMA/cma: Use rdma_read_gid_attr_ndev_rcu to access netdev
dab2175800ef RDMA/rxe: Use rdma_read_gid_attr_ndev_rcu to access netdev
5102eca9039b net/smc: Use rdma_read_gid_l2_fields to L2 fields
943bd984b108 RDMA/core: Allow detaching gid attribute netdevice for RoCE


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-13 18:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01  9:36 【Ask for help】 A question for __ib_cache_gid_add() oulijun
2019-11-01 13:05 ` Jason Gunthorpe
2019-11-01 13:21   ` Mark Zhang
2019-11-01 13:36     ` Mark Zhang
2019-11-05  8:47     ` oulijun
2019-11-01 16:00   ` Parav Pandit
2019-11-13 12:35     ` oulijun
2019-11-13 18:03       ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).