All of lore.kernel.org
 help / color / mirror / Atom feed
From: Antoine Tenart <atenart@kernel.org>
To: davem@davemloft.net, kuba@kernel.org
Cc: Antoine Tenart <atenart@kernel.org>,
	pabeni@redhat.com, gregkh@linuxfoundation.org,
	ebiederm@xmission.com, stephen@networkplumber.org,
	herbert@gondor.apana.org.au, juri.lelli@redhat.com,
	netdev@vger.kernel.org
Subject: [RFC PATCH net-next 8/9] net: delay device_del until run_todo
Date: Tue, 28 Sep 2021 14:54:59 +0200	[thread overview]
Message-ID: <20210928125500.167943-9-atenart@kernel.org> (raw)
In-Reply-To: <20210928125500.167943-1-atenart@kernel.org>

Move the deletion of the device from unregister_netdevice_many to
netdev_run_todo and move it outside the rtnl lock.

12 years ago was reported an ABBA deadlock between net-sysfs and the
netdevice unregistration[1]. The issue was the following:

              A                            B

   unregister_netdevice_many         sysfs access
   rtnl_lock                         sysfs refcount
				     rtnl_lock
   drain sysfs files
   => waits for B                    => waits for A

This was avoided thanks to two patches[2][3], which used rtnl_trylock in
net-sysfs and restarted the syscall when the rtnl lock was already
taken. This way kernfs nodes were not blocking the netdevice
unregistration anymore.

This was fine at the time but is now causing some issues: creating and
moving interfaces makes userspace (systemd, NetworkManager or others) to
spin a lot as syscalls are restarted, which has an impact on
performance. This happens for example when creating pods. While
userspace applications could be improved, fixing this in-kernel has the
benefit of fixing the root cause of the issue.

The sysfs removal is done in device_del, and moving it outside of the
rtnl lock does fix the initial deadlock. With that the trylock/restart
logic can be removed in a following-up patch.

[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/
(I'm referencing the full thread but the sysfs issue was discussed later
in the thread).
[2] 336ca57c3b4e ("net-sysfs: Use rtnl_trylock in sysfs methods.")
[3] 5a5990d3090b ("net: Avoid race between network down and sysfs")

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c       | 2 ++
 net/core/net-sysfs.c | 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a1eab120bb50..d774fbec5d63 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10593,6 +10593,8 @@ void netdev_run_todo(void)
 			continue;
 		}
 
+		device_del(&dev->dev);
+
 		dev->reg_state = NETREG_UNREGISTERED;
 
 		netdev_wait_allrefs(dev);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 21c3fdeccf20..e754f00c117b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1955,8 +1955,6 @@ void netdev_unregister_kobject(struct net_device *ndev)
 	remove_queue_kobjects(ndev);
 
 	pm_runtime_set_memalloc_noio(dev, false);
-
-	device_del(dev);
 }
 
 /* Create sysfs entries for network device. */
-- 
2.31.1


  parent reply	other threads:[~2021-09-28 12:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 12:54 [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 1/9] net-sysfs: try not to restart the syscall if it will fail eventually Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 2/9] net: split unlisting the net device from unlisting its node name Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 3/9] net: export netdev_name_node_lookup Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 4/9] bonding: use the correct function to check for netdev name collision Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 5/9] ppp: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 6/9] net: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 7/9] net: delay the removal of the name nodes until run_todo Antoine Tenart
2021-09-28 12:54 ` Antoine Tenart [this message]
2021-09-29  0:02   ` [RFC PATCH net-next 8/9] net: delay device_del " Jakub Kicinski
2021-09-29  8:26     ` Antoine Tenart
2021-09-29 13:31       ` Jakub Kicinski
2021-09-29 17:31         ` Antoine Tenart
2021-10-29  9:04           ` Antoine Tenart
2021-10-05 15:21         ` Antoine Tenart
2021-10-05 18:34           ` Jakub Kicinski
2021-09-28 12:55 ` [RFC PATCH net-next 9/9] net-sysfs: remove the use of rtnl_trylock/restart_syscall Antoine Tenart
2021-10-06  6:45   ` Michal Hocko
2021-10-06  8:03     ` Antoine Tenart
2021-10-06  8:55       ` Michal Hocko
2021-10-06  6:37 ` [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Michal Hocko
2021-10-06  7:59   ` Antoine Tenart
2021-10-06  8:35     ` Michal Hocko
2021-10-29 14:33 ` Antoine Tenart
2021-10-29 15:45   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210928125500.167943-9-atenart@kernel.org \
    --to=atenart@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.