netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mlx5 hot unplug regression on z/VM
@ 2020-07-06 16:12 Niklas Schnelle
  2020-07-07  9:08 ` Shay Drory
  0 siblings, 1 reply; 2+ messages in thread
From: Niklas Schnelle @ 2020-07-06 16:12 UTC (permalink / raw)
  To: Shay Drory; +Cc: netdev, Stefan Raspl

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]

Hi Mr. Drory, Hi Netdev List,

I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
we've been seeing a regression with hot unplug of ConnectX-4 VFs
from z/VM guests. In -rc1 this still looked like a simple issue and
I wrote the following mail:
https://lkml.org/lkml/2020/6/12/376
sadly since I think -rc2 I've not been able to get this working consistently
anymore (it did work consistently with the change described above on -rc1).
In his answer Saeed Mahameed pointed me to your commits as dealing with
similar issues so I wanted to get some input on how to debug this
further.

The commands I used to test this are as follows (on a z/VM guest running
vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
dmesg attached to this mail:

# vmcp q pcif  // query for available PCI devices
# vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
# vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong

I guess you don't have access to hardware but I'll be happy to assist
as good as I can since digging on my own I sadly really don't know
enough about the mlx5_core driver to make more progress.

Best regards,
Niklas Schnelle

[-- Attachment #2: dmesg_mlx5_detach_zvm.txt --]
[-- Type: text/plain, Size: 7690 bytes --]

[   70.773478] IPv6: ADDRCONF(NETDEV_CHANGE): ens514np0: link becomes ready
[   70.858688] RPC: Registered named UNIX socket transport module.
[   70.858709] RPC: Registered udp transport module.
[   70.858714] RPC: Registered tcp transport module.
[   70.858719] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   70.922689] RPC: Registered rdma transport module.
[   70.922724] RPC: Registered rdma backchannel transport module.
[   92.022705] mlx5_core 0000:00:00.0: poll_health:702:(pid 0): Fatal error 1 detected
[   92.022995] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[0] 0xffffffff
[   92.023002] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[1] 0xffffffff
[   92.023009] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[2] 0xffffffff
[   92.023016] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[3] 0xffffffff
[   92.023023] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[4] 0xffffffff
[   92.023030] mlx5_core 0000:00:00.0: print_health_info:383:(pid 0): assert_exit_ptr 0xffffffff
[   92.023038] mlx5_core 0000:00:00.0: print_health_info:385:(pid 0): assert_callra 0xffffffff
[   92.023048] mlx5_core 0000:00:00.0: print_health_info:388:(pid 0): fw_ver 65535.65535.65535
[   92.023056] mlx5_core 0000:00:00.0: print_health_info:389:(pid 0): hw_id 0xffffffff
[   92.023065] mlx5_core 0000:00:00.0: print_health_info:390:(pid 0): irisc_index 255
[   92.023075] mlx5_core 0000:00:00.0: print_health_info:391:(pid 0): synd 0xff: unrecognized error
[   92.023129] mlx5_core 0000:00:00.0: print_health_info:393:(pid 0): ext_synd 0xffff
[   92.023136] mlx5_core 0000:00:00.0: print_health_info:395:(pid 0): raw fw_ver 0xffffffff
[   92.023823] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
[   92.023859] mlx5_core 0000:00:00.0: PME# disabled
[  102.751227] mlx5_core 0000:00:00.0: poll_health:717:(pid 10): device's health compromised - reached miss count
[  102.751247] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[0] 0xffffffff
[  102.751253] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[1] 0xffffffff
[  102.751260] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[2] 0xffffffff
[  102.751266] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[3] 0xffffffff
[  102.751273] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[4] 0xffffffff
[  102.751280] mlx5_core 0000:00:00.0: print_health_info:383:(pid 10): assert_exit_ptr 0xffffffff
[  102.751287] mlx5_core 0000:00:00.0: print_health_info:385:(pid 10): assert_callra 0xffffffff
[  102.751296] mlx5_core 0000:00:00.0: print_health_info:388:(pid 10): fw_ver 65535.65535.65535
[  102.751303] mlx5_core 0000:00:00.0: print_health_info:389:(pid 10): hw_id 0xffffffff
[  102.751311] mlx5_core 0000:00:00.0: print_health_info:390:(pid 10): irisc_index 255
[  102.751401] mlx5_core 0000:00:00.0: print_health_info:391:(pid 10): synd 0xff: unrecognized error
[  102.751407] mlx5_core 0000:00:00.0: print_health_info:393:(pid 10): ext_synd 0xffff
[  102.751413] mlx5_core 0000:00:00.0: print_health_info:395:(pid 10): raw fw_ver 0xffffffff
[  157.068876] mlx5_core 0000:00:00.0: wait_func:1008:(pid 74): 2RST_QP(0x50a) timeout. Will cause a leak of a command resource
[  157.069145] infiniband mlx5_0: destroy_qp_common:2367:(pid 74): mlx5_ib: modify QP 0x000724 to RESET failed
[  177.548701] mlx5_core 0000:00:00.0: wait_func:1008:(pid 8): QUERY_VPORT_COUNTER(0x770) timeout. Will cause a leak of a command resource
[  218.508382] mlx5_core 0000:00:00.0: wait_func:1008:(pid 74): DESTROY_QP(0x501) timeout. Will cause a leak of a command resource
[  238.988229] mlx5_core 0000:00:00.0: wait_func:1008:(pid 8): QUERY_Q_COUNTER(0x773) timeout. Will cause a leak of a command resource
[  244.108211] INFO: task NetworkManager:681 blocked for more than 122 seconds.
[  244.108446]       Not tainted 5.8.0-rc4 #1
[  244.108450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  244.108456] NetworkManager  D 9240   681      1 0x00000000
[  244.108467] Call Trace:
[  244.108479]  [<00000000b9225c46>] __schedule+0x2d6/0x5a8
[  244.108485]  [<00000000b9225f72>] schedule+0x5a/0x130
[  244.108491]  [<00000000b9226704>] schedule_preempt_disabled+0x2c/0x48
[  244.108497]  [<00000000b92289c2>] __mutex_lock+0x372/0x960
[  244.108503]  [<00000000b9228fe2>] mutex_lock_nested+0x32/0x40
[  244.108565]  [<000003ff8020cbb2>] mlx5e_xdp+0x62/0xd0 [mlx5_core]
[  244.108573]  [<00000000b8fbcc24>] __dev_xdp_query.part.0+0x34/0x70
[  244.108582]  [<00000000b8fe1d3c>] rtnl_xdp_fill+0x84/0x258
[  244.108588]  [<00000000b8fe5f54>] rtnl_fill_ifinfo+0x6a4/0xd68
[  244.108595]  [<00000000b8fe6f3e>] rtnl_getlink+0x266/0x3e8
[  244.108602]  [<00000000b8fe2cce>] rtnetlink_rcv_msg+0x18e/0x4b0
[  244.108618]  [<00000000b904913e>] netlink_rcv_skb+0x4e/0xf8
[  244.108623]  [<00000000b90488b2>] netlink_unicast+0x18a/0x268
[  244.108627]  [<00000000b9048cf6>] netlink_sendmsg+0x366/0x448
[  244.108634]  [<00000000b8f94b4c>] sock_sendmsg+0x64/0x78
[  244.108638]  [<00000000b8f965d6>] ____sys_sendmsg+0x1f6/0x240
[  244.108643]  [<00000000b8f98c04>] ___sys_sendmsg+0x74/0xa8
[  244.108648]  [<00000000b8f98cf4>] __sys_sendmsg+0x64/0xa8
[  244.108652]  [<00000000b8f995b4>] __do_sys_socketcall+0x2dc/0x340
[  244.108676]  [<00000000b922e618>] system_call+0xdc/0x2c0
[  244.108682] 2 locks held by NetworkManager/681:
[  244.108688]  #0: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x160/0x4b0
[  244.108698]  #1: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_xdp+0x62/0xd0 [mlx5_core]
[  244.108731]
               Showing all locks held in the system:
[  244.108743] 3 locks held by kworker/u128:0/8:
[  244.108749]  #0: 00000000d2f21148 ((wq_completion)mlx5e){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[  244.108764]  #1: 000003e000047e18 ((work_completion)(&priv->update_stats_work)){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[  244.108771]  #2: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_update_stats_work+0x34/0x68 [mlx5_core]
[  244.108794] 3 locks held by kworker/1:0/17:
[  244.108798]  #0: 00000000e9922948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[  244.108805]  #1: 000003e00028fe18 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[  244.108813]  #2: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0x22/0x38
[  244.108822] 1 lock held by khungtaskd/29:
[  244.108826]  #0: 00000000b969a990 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x50
[  244.108837] 6 locks held by kmcheck/74:
[  244.108841]  #0: 00000000b9739e48 (crw_handler_mutex){+.+.}-{3:3}, at: crw_collect_info+0x250/0x350
[  244.108849]  #1: 00000000b9719668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48
[  244.108858]  #2: 00000000d8169238 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x32/0x50
[  244.108866]  #3: 00000000d426cbb8 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x38/0x140 [mlx5_core]
[  244.108887]  #4: 000003ff8024a280 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_unregister_device+0x36/0xc8 [mlx5_core]
[  244.108908]  #5: 00000000cf4f8650 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x32/0xc8 [ib_core]
[  244.108957] 2 locks held by NetworkManager/681:
[  244.108961]  #0: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x160/0x4b0
[  244.108973]  #1: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_xdp+0x62/0xd0 [mlx5_core]


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mlx5 hot unplug regression on z/VM
  2020-07-06 16:12 mlx5 hot unplug regression on z/VM Niklas Schnelle
@ 2020-07-07  9:08 ` Shay Drory
  0 siblings, 0 replies; 2+ messages in thread
From: Shay Drory @ 2020-07-07  9:08 UTC (permalink / raw)
  To: Niklas Schnelle; +Cc: netdev, Stefan Raspl

Hello Mr. Schnelle.

I have reviewed the code and the log, and I think I understood what is the bug.
As far I understand, the bug is as you pointed out in the mail[1], switching the call order of the two function.
running mlx5_drain_health_wq() prevents new health works to be queue, so when we calling to
mlx5_unregister_device() the driver in unaware that the VF might be missing.
I will start working on a patch to fix this.

[1] https://lkml.org/lkml/2020/6/12/376

On 7/6/2020 19:12, Niklas Schnelle wrote:

> Hi Mr. Drory, Hi Netdev List,
>
> I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
> we've been seeing a regression with hot unplug of ConnectX-4 VFs
> from z/VM guests. In -rc1 this still looked like a simple issue and
> I wrote the following mail:
> https://lkml.org/lkml/2020/6/12/376
> sadly since I think -rc2 I've not been able to get this working consistently
> anymore (it did work consistently with the change described above on -rc1).
> In his answer Saeed Mahameed pointed me to your commits as dealing with
> similar issues so I wanted to get some input on how to debug this
> further.
>
> The commands I used to test this are as follows (on a z/VM guest running
> vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
> dmesg attached to this mail:
>
> # vmcp q pcif  // query for available PCI devices
> # vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
> # vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong
>
> I guess you don't have access to hardware but I'll be happy to assist
> as good as I can since digging on my own I sadly really don't know
> enough about the mlx5_core driver to make more progress.
>
> Best regards,
> Niklas Schnelle



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-07-07  9:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 16:12 mlx5 hot unplug regression on z/VM Niklas Schnelle
2020-07-07  9:08 ` Shay Drory

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).