* mlx5 hot unplug regression on z/VM
@ 2020-07-06 16:12 Niklas Schnelle
2020-07-07 9:08 ` Shay Drory
0 siblings, 1 reply; 2+ messages in thread
From: Niklas Schnelle @ 2020-07-06 16:12 UTC (permalink / raw)
To: Shay Drory; +Cc: netdev, Stefan Raspl
[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]
Hi Mr. Drory, Hi Netdev List,
I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
we've been seeing a regression with hot unplug of ConnectX-4 VFs
from z/VM guests. In -rc1 this still looked like a simple issue and
I wrote the following mail:
https://lkml.org/lkml/2020/6/12/376
sadly since I think -rc2 I've not been able to get this working consistently
anymore (it did work consistently with the change described above on -rc1).
In his answer Saeed Mahameed pointed me to your commits as dealing with
similar issues so I wanted to get some input on how to debug this
further.
The commands I used to test this are as follows (on a z/VM guest running
vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
dmesg attached to this mail:
# vmcp q pcif // query for available PCI devices
# vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
# vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong
I guess you don't have access to hardware but I'll be happy to assist
as good as I can since digging on my own I sadly really don't know
enough about the mlx5_core driver to make more progress.
Best regards,
Niklas Schnelle
[-- Attachment #2: dmesg_mlx5_detach_zvm.txt --]
[-- Type: text/plain, Size: 7690 bytes --]
[ 70.773478] IPv6: ADDRCONF(NETDEV_CHANGE): ens514np0: link becomes ready
[ 70.858688] RPC: Registered named UNIX socket transport module.
[ 70.858709] RPC: Registered udp transport module.
[ 70.858714] RPC: Registered tcp transport module.
[ 70.858719] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 70.922689] RPC: Registered rdma transport module.
[ 70.922724] RPC: Registered rdma backchannel transport module.
[ 92.022705] mlx5_core 0000:00:00.0: poll_health:702:(pid 0): Fatal error 1 detected
[ 92.022995] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[0] 0xffffffff
[ 92.023002] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[1] 0xffffffff
[ 92.023009] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[2] 0xffffffff
[ 92.023016] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[3] 0xffffffff
[ 92.023023] mlx5_core 0000:00:00.0: print_health_info:380:(pid 0): assert_var[4] 0xffffffff
[ 92.023030] mlx5_core 0000:00:00.0: print_health_info:383:(pid 0): assert_exit_ptr 0xffffffff
[ 92.023038] mlx5_core 0000:00:00.0: print_health_info:385:(pid 0): assert_callra 0xffffffff
[ 92.023048] mlx5_core 0000:00:00.0: print_health_info:388:(pid 0): fw_ver 65535.65535.65535
[ 92.023056] mlx5_core 0000:00:00.0: print_health_info:389:(pid 0): hw_id 0xffffffff
[ 92.023065] mlx5_core 0000:00:00.0: print_health_info:390:(pid 0): irisc_index 255
[ 92.023075] mlx5_core 0000:00:00.0: print_health_info:391:(pid 0): synd 0xff: unrecognized error
[ 92.023129] mlx5_core 0000:00:00.0: print_health_info:393:(pid 0): ext_synd 0xffff
[ 92.023136] mlx5_core 0000:00:00.0: print_health_info:395:(pid 0): raw fw_ver 0xffffffff
[ 92.023823] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=B, anc=0, erc=0, rsid=0
[ 92.023859] mlx5_core 0000:00:00.0: PME# disabled
[ 102.751227] mlx5_core 0000:00:00.0: poll_health:717:(pid 10): device's health compromised - reached miss count
[ 102.751247] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[0] 0xffffffff
[ 102.751253] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[1] 0xffffffff
[ 102.751260] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[2] 0xffffffff
[ 102.751266] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[3] 0xffffffff
[ 102.751273] mlx5_core 0000:00:00.0: print_health_info:380:(pid 10): assert_var[4] 0xffffffff
[ 102.751280] mlx5_core 0000:00:00.0: print_health_info:383:(pid 10): assert_exit_ptr 0xffffffff
[ 102.751287] mlx5_core 0000:00:00.0: print_health_info:385:(pid 10): assert_callra 0xffffffff
[ 102.751296] mlx5_core 0000:00:00.0: print_health_info:388:(pid 10): fw_ver 65535.65535.65535
[ 102.751303] mlx5_core 0000:00:00.0: print_health_info:389:(pid 10): hw_id 0xffffffff
[ 102.751311] mlx5_core 0000:00:00.0: print_health_info:390:(pid 10): irisc_index 255
[ 102.751401] mlx5_core 0000:00:00.0: print_health_info:391:(pid 10): synd 0xff: unrecognized error
[ 102.751407] mlx5_core 0000:00:00.0: print_health_info:393:(pid 10): ext_synd 0xffff
[ 102.751413] mlx5_core 0000:00:00.0: print_health_info:395:(pid 10): raw fw_ver 0xffffffff
[ 157.068876] mlx5_core 0000:00:00.0: wait_func:1008:(pid 74): 2RST_QP(0x50a) timeout. Will cause a leak of a command resource
[ 157.069145] infiniband mlx5_0: destroy_qp_common:2367:(pid 74): mlx5_ib: modify QP 0x000724 to RESET failed
[ 177.548701] mlx5_core 0000:00:00.0: wait_func:1008:(pid 8): QUERY_VPORT_COUNTER(0x770) timeout. Will cause a leak of a command resource
[ 218.508382] mlx5_core 0000:00:00.0: wait_func:1008:(pid 74): DESTROY_QP(0x501) timeout. Will cause a leak of a command resource
[ 238.988229] mlx5_core 0000:00:00.0: wait_func:1008:(pid 8): QUERY_Q_COUNTER(0x773) timeout. Will cause a leak of a command resource
[ 244.108211] INFO: task NetworkManager:681 blocked for more than 122 seconds.
[ 244.108446] Not tainted 5.8.0-rc4 #1
[ 244.108450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 244.108456] NetworkManager D 9240 681 1 0x00000000
[ 244.108467] Call Trace:
[ 244.108479] [<00000000b9225c46>] __schedule+0x2d6/0x5a8
[ 244.108485] [<00000000b9225f72>] schedule+0x5a/0x130
[ 244.108491] [<00000000b9226704>] schedule_preempt_disabled+0x2c/0x48
[ 244.108497] [<00000000b92289c2>] __mutex_lock+0x372/0x960
[ 244.108503] [<00000000b9228fe2>] mutex_lock_nested+0x32/0x40
[ 244.108565] [<000003ff8020cbb2>] mlx5e_xdp+0x62/0xd0 [mlx5_core]
[ 244.108573] [<00000000b8fbcc24>] __dev_xdp_query.part.0+0x34/0x70
[ 244.108582] [<00000000b8fe1d3c>] rtnl_xdp_fill+0x84/0x258
[ 244.108588] [<00000000b8fe5f54>] rtnl_fill_ifinfo+0x6a4/0xd68
[ 244.108595] [<00000000b8fe6f3e>] rtnl_getlink+0x266/0x3e8
[ 244.108602] [<00000000b8fe2cce>] rtnetlink_rcv_msg+0x18e/0x4b0
[ 244.108618] [<00000000b904913e>] netlink_rcv_skb+0x4e/0xf8
[ 244.108623] [<00000000b90488b2>] netlink_unicast+0x18a/0x268
[ 244.108627] [<00000000b9048cf6>] netlink_sendmsg+0x366/0x448
[ 244.108634] [<00000000b8f94b4c>] sock_sendmsg+0x64/0x78
[ 244.108638] [<00000000b8f965d6>] ____sys_sendmsg+0x1f6/0x240
[ 244.108643] [<00000000b8f98c04>] ___sys_sendmsg+0x74/0xa8
[ 244.108648] [<00000000b8f98cf4>] __sys_sendmsg+0x64/0xa8
[ 244.108652] [<00000000b8f995b4>] __do_sys_socketcall+0x2dc/0x340
[ 244.108676] [<00000000b922e618>] system_call+0xdc/0x2c0
[ 244.108682] 2 locks held by NetworkManager/681:
[ 244.108688] #0: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x160/0x4b0
[ 244.108698] #1: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_xdp+0x62/0xd0 [mlx5_core]
[ 244.108731]
Showing all locks held in the system:
[ 244.108743] 3 locks held by kworker/u128:0/8:
[ 244.108749] #0: 00000000d2f21148 ((wq_completion)mlx5e){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[ 244.108764] #1: 000003e000047e18 ((work_completion)(&priv->update_stats_work)){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[ 244.108771] #2: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_update_stats_work+0x34/0x68 [mlx5_core]
[ 244.108794] 3 locks held by kworker/1:0/17:
[ 244.108798] #0: 00000000e9922948 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[ 244.108805] #1: 000003e00028fe18 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x1dc/0x478
[ 244.108813] #2: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0x22/0x38
[ 244.108822] 1 lock held by khungtaskd/29:
[ 244.108826] #0: 00000000b969a990 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.0+0x0/0x50
[ 244.108837] 6 locks held by kmcheck/74:
[ 244.108841] #0: 00000000b9739e48 (crw_handler_mutex){+.+.}-{3:3}, at: crw_collect_info+0x250/0x350
[ 244.108849] #1: 00000000b9719668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48
[ 244.108858] #2: 00000000d8169238 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x32/0x50
[ 244.108866] #3: 00000000d426cbb8 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x38/0x140 [mlx5_core]
[ 244.108887] #4: 000003ff8024a280 (mlx5_intf_mutex){+.+.}-{3:3}, at: mlx5_unregister_device+0x36/0xc8 [mlx5_core]
[ 244.108908] #5: 00000000cf4f8650 (&device->unregistration_lock){+.+.}-{3:3}, at: __ib_unregister_device+0x32/0xc8 [ib_core]
[ 244.108957] 2 locks held by NetworkManager/681:
[ 244.108961] #0: 00000000b972cae8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x160/0x4b0
[ 244.108973] #1: 00000000d3103b80 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_xdp+0x62/0xd0 [mlx5_core]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: mlx5 hot unplug regression on z/VM
2020-07-06 16:12 mlx5 hot unplug regression on z/VM Niklas Schnelle
@ 2020-07-07 9:08 ` Shay Drory
0 siblings, 0 replies; 2+ messages in thread
From: Shay Drory @ 2020-07-07 9:08 UTC (permalink / raw)
To: Niklas Schnelle; +Cc: netdev, Stefan Raspl
Hello Mr. Schnelle.
I have reviewed the code and the log, and I think I understood what is the bug.
As far I understand, the bug is as you pointed out in the mail[1], switching the call order of the two function.
running mlx5_drain_health_wq() prevents new health works to be queue, so when we calling to
mlx5_unregister_device() the driver in unaware that the VF might be missing.
I will start working on a patch to fix this.
[1] https://lkml.org/lkml/2020/6/12/376
On 7/6/2020 19:12, Niklas Schnelle wrote:
> Hi Mr. Drory, Hi Netdev List,
>
> I'm the PCI Subsystem maintainer for Linux on IBM Z and since v5.8-rc1
> we've been seeing a regression with hot unplug of ConnectX-4 VFs
> from z/VM guests. In -rc1 this still looked like a simple issue and
> I wrote the following mail:
> https://lkml.org/lkml/2020/6/12/376
> sadly since I think -rc2 I've not been able to get this working consistently
> anymore (it did work consistently with the change described above on -rc1).
> In his answer Saeed Mahameed pointed me to your commits as dealing with
> similar issues so I wanted to get some input on how to debug this
> further.
>
> The commands I used to test this are as follows (on a z/VM guest running
> vanilla debug_defconfig v5.8-rc4 installed on Fedora 31) and you find the resulting
> dmesg attached to this mail:
>
> # vmcp q pcif // query for available PCI devices
> # vmcp attach pcif <FID> to \* // where <FID> is one of the ones listed by the above command
> # vmcp detach pcif <FID> // This does a hot unplug and is where things start going wrong
>
> I guess you don't have access to hardware but I'll be happy to assist
> as good as I can since digging on my own I sadly really don't know
> enough about the mlx5_core driver to make more progress.
>
> Best regards,
> Niklas Schnelle
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-07-07 9:08 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 16:12 mlx5 hot unplug regression on z/VM Niklas Schnelle
2020-07-07 9:08 ` Shay Drory
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).