linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Fix a devlink AB-BA deadlock on net namespace deletion
       [not found] ` <2141729194.2508.1650780142235.JavaMail.xmail@wm-2-new>
@ 2022-04-24  7:08   ` gregkh
  2022-04-25  2:24   ` Parav Pandit
  1 sibling, 0 replies; 2+ messages in thread
From: gregkh @ 2022-04-24  7:08 UTC (permalink / raw)
  To: 张广辉; +Cc: roid, saeedm, parav, jgg, linux-kernel, stable

On Sun, Apr 24, 2022 at 02:02:22PM +0800, 张广辉 wrote:
> 
> Hi  all
> 

<snip>


Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- Your patch is malformed (tabs converted to spaces, linewrapped, etc.)
  and can not be applied.  Please read the file,
  Documentation/email-clients.txt in order to fix this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot


^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: Fix a devlink AB-BA deadlock on net namespace deletion
       [not found] ` <2141729194.2508.1650780142235.JavaMail.xmail@wm-2-new>
  2022-04-24  7:08   ` Fix a devlink AB-BA deadlock on net namespace deletion gregkh
@ 2022-04-25  2:24   ` Parav Pandit
  1 sibling, 0 replies; 2+ messages in thread
From: Parav Pandit @ 2022-04-25  2:24 UTC (permalink / raw)
  To: 张广辉,
	Roi Dayan, Saeed Mahameed, Jason Gunthorpe, gregkh
  Cc: linux-kernel , stable

Did you audit if it is safe to not hold the pernet_ops_rwsem when traversing the pernet_list list?
Last time several months back when I reviewed this area for this issue, it appeared that pernet_ops_rwsem must be held while traversing pernet_list.

You also need to fix the mail client to send text only patches.


From: 张广辉 <zhang.guanghui@cestc.cn> 
Sent: Sunday, April 24, 2022 2:02 AM
To: 张广辉 <zhang.guanghui@cestc.cn>; Roi Dayan <roid@nvidia.com>; Saeed Mahameed <saeedm@nvidia.com>; Parav Pandit <parav@nvidia.com>; Jason Gunthorpe <jgg@nvidia.com>; gregkh <gregkh@linuxfoundation.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>; stable <stable@vger.kernel.org>
Subject: Fix a devlink AB-BA deadlock on net namespace deletion


Hi  all

Deleting a netns holds pernet_ops_rwsem and then takes devlink_mutex. 
at that time changing mode to switchdev, holds the devlink_mutex, unregistered to netdevice notifier and then takes pernet_ops_rwsem. 
So AB-BA deadlock problem can happen. I have made a patch to fix the deadlock problem, it work well. please help with the review. Thanks 


 Example sequence is: 
\$ ip netns add foo
\$ ip netns del foo & 
\$ devlink  dev eswitch set pci/0000:af:00.1 mode switchdev

Process A:                                                                                                                                                Process B:
cleanup_net()                                                              genl_family_rcv_msg_doit                                               
  down_read(&pernet_ops_rwsem); <- first sem acquired                               
     ops_pre_exit_list()                                                           pre_doit 
                                                                              devlink_nl_pre_doit mutex_lock(&devlink_mutex); <-first devlink_mutex acquired
       pre_exit()
         devlink_pernet_pre_exit() mutex_lock(&devlink_mutex);<-first devlink_mutex acquired
                                                                                       devlink_nl_cmd_eswitch_set_doit
                                                                                           mlx5_devlink_eswitch_mode_set 
                                                                                                mlx5_lag_disable_change
                                                                                                     mlx5_disable_lag
                                                                                                       mlx5_rescan_drivers_locked
                                                                                                         device_del
                                                                                                           ...
                                                                                                           unregister_netdevice_notifier 
                                                                                                             down_write(&pernet_ops_rwsem);<- first sem acquired
 

 deleting netns trace:
[  248.061947] INFO: task kworker/u160:3:1179 blocked for more than 122 seconds.
[  248.061953]       Not tainted 5.15.13-0.el9.x86_64 #1
[  248.061955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.061956] task:kworker/u160:3  state:D stack:    0 pid: 1179 ppid:     2 flags:0x00004000
[  248.061962] Workqueue: netns cleanup_net
[  248.061970] Call Trace:
[  248.061972]  <TASK>
[  248.061975]  __schedule+0x200/0x540
[  248.061982]  schedule+0x44/0xa0
[  248.061984]  schedule_preempt_disabled+0xa/0x10
[  248.061986]  __mutex_lock.constprop.0+0x212/0x400
[  248.061989]  devlink_pernet_pre_exit+0x2a/0x140
[  248.061994]  cleanup_net+0x1d2/0x3a0
[  248.061997]  process_one_work+0x1e8/0x390
[  248.062003]  worker_thread+0x53/0x3c0
[  248.062005]  ? process_one_work+0x390/0x390
[  248.062007]  kthread+0x10c/0x130
[  248.062011]  ? set_kthread_struct+0x40/0x40
[  248.062014]  ret_from_fork+0x1f/0x30
[  248.062020]  </TASK>

changing mode to switchdev trace:

[  248.062078] task:devlink         state:D stack:    0 pid: 8546 ppid:  8542 flags:0x00004000
[  248.062081] Call Trace:
[  248.062082]  <TASK>
[  248.062083]  __schedule+0x200/0x540
[  248.062087]  ? free_msg+0x3f/0xb0 [mlx5_core]
[  248.062156]  schedule+0x44/0xa0
[  248.062158]  rwsem_down_write_slowpath+0x19c/0x3c0
[  248.062165]  unregister_netdevice_notifier+0x1c/0xb0
[  248.062168]  mlx5_ib_roce_cleanup+0x8a/0x110 [mlx5_ib]
[  248.062184]  mlx5r_remove+0x36/0x60 [mlx5_ib]
[  248.062196]  auxiliary_bus_remove+0x18/0x30
[  248.062200]  __device_release_driver+0x177/0x240
[  248.062203]  device_release_driver+0x24/0x30
[  248.062205]  bus_remove_device+0xd8/0x140
[  248.062210]  device_del+0x18b/0x400
[  248.062213]  mlx5_rescan_drivers_locked.part.0+0x7e/0x150 [mlx5_core]
[  248.062267]  mlx5_disable_lag+0x149/0x160 [mlx5_core]
[  248.062318]  mlx5_lag_disable_change+0x60/0xa0 [mlx5_core]
[  248.062369]  mlx5_devlink_eswitch_mode_set+0x4b/0x1a0 [mlx5_core]
[  248.062436]  devlink_nl_cmd_eswitch_set_doit+0xc1/0x150
[  248.062440]  genl_family_rcv_msg_doit+0xe7/0x150
[  248.062445]  genl_rcv_msg+0xdc/0x1e0
[  248.062448]  ? __devlink_port_phys_port_name_get+0x1e0/0x1e0
[  248.062451]  ? genl_get_cmd+0xd0/0xd0
[  248.062454]  netlink_rcv_skb+0x4e/0xf0
[  248.062457]  genl_rcv+0x24/0x40
[  248.062460]  netlink_unicast+0x1fe/0x2d0
[  248.062463]  netlink_sendmsg+0x24f/0x4b0
[  248.062466]  sock_sendmsg+0x5b/0x60
[  248.062469]  __sys_sendto+0xf0/0x160
[  248.062473]  ? handle_mm_fault+0xbf/0x280
[  248.062478]  ? do_user_addr_fault+0x1d0/0x670
[  248.062482]  __x64_sys_sendto+0x20/0x30
[  248.062484]  do_syscall_64+0x38/0x90
[  248.062487]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  248.062492] RIP: 0033:0x7ff8cc469c3a
[  248.062494] RSP: 002b:00007ffe06025e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  248.062497] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007ff8cc469c3a
[  248.062499] RDX: 0000000000000038 RSI: 000055c261bf7440 RDI: 0000000000000003
[  248.062501] RBP: 0000000000000000 R08: 00007ff8cc52d200 R09: 000000000000000c
[  248.062502] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  248.062503] R13: 000055c261bf72a0 R14: 000055c260a01d5c R15: 000055c261bf7440
[  248.062505]  </TASK>


the patch details: 

diff --git a/linux/net/core/net_namespace.c b/linux/net/core/net_namespace.c
index 202fa5eac..5c872db1f 100644
--- a/linux/net/core/net_namespace.c
+++ b/linux/net/core/net_namespace.c
@@ -576,6 +576,7 @@ static void cleanup_net(struct work_struct *work)
                list_add_tail(&net->exit_list, &net_exit_list);
        }

+       up_read(&pernet_ops_rwsem);
        /* Run all of the network namespace pre_exit methods */
        list_for_each_entry_reverse(ops, &pernet_list, list)
                ops_pre_exit_list(ops, &net_exit_list);
@@ -596,7 +597,6 @@ static void cleanup_net(struct work_struct *work)
        list_for_each_entry_reverse(ops, &pernet_list, list)
                ops_free_list(ops, &net_exit_list);

-       up_read(&pernet_ops_rwsem);

        /* Ensure there are no outstanding rcu callbacks using this
         * network namespace.
 


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-04-25  2:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <342746123.81421.1650369512240.JavaMail.xmail@ma-wm-1-new>
     [not found] ` <2141729194.2508.1650780142235.JavaMail.xmail@wm-2-new>
2022-04-24  7:08   ` Fix a devlink AB-BA deadlock on net namespace deletion gregkh
2022-04-25  2:24   ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).