netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Antoine Tenart <atenart@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: davem@davemloft.net, pabeni@redhat.com,
	gregkh@linuxfoundation.org, ebiederm@xmission.com,
	stephen@networkplumber.org, herbert@gondor.apana.org.au,
	juri.lelli@redhat.com, netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next 8/9] net: delay device_del until run_todo
Date: Wed, 29 Sep 2021 19:31:56 +0200	[thread overview]
Message-ID: <163293671647.3047.7240482794798716272@kwain> (raw)
In-Reply-To: <20210929063126.4a702dbd@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

Quoting Jakub Kicinski (2021-09-29 15:31:26)
> On Wed, 29 Sep 2021 10:26:35 +0200 Antoine Tenart wrote:
> > Quoting Jakub Kicinski (2021-09-29 02:02:29)
> > > On Tue, 28 Sep 2021 14:54:59 +0200 Antoine Tenart wrote:  
> > > > The sysfs removal is done in device_del, and moving it outside of the
> > > > rtnl lock does fix the initial deadlock. With that the trylock/restart
> > > > logic can be removed in a following-up patch.  
> > >   
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index a1eab120bb50..d774fbec5d63 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -10593,6 +10593,8 @@ void netdev_run_todo(void)
> > > >                       continue;
> > > >               }
> > > >  
> > > > +             device_del(&dev->dev);
> > > > +
> > > >               dev->reg_state = NETREG_UNREGISTERED;
> > > >  
> > > >               netdev_wait_allrefs(dev);
> > > > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> > > > index 21c3fdeccf20..e754f00c117b 100644
> > > > --- a/net/core/net-sysfs.c
> > > > +++ b/net/core/net-sysfs.c
> > > > @@ -1955,8 +1955,6 @@ void netdev_unregister_kobject(struct net_device *ndev)
> > > >       remove_queue_kobjects(ndev);
> > > >  
> > > >       pm_runtime_set_memalloc_noio(dev, false);
> > > > -
> > > > -     device_del(dev);
> > > >  }
> > > >  
> > > >  /* Create sysfs entries for network device. */  
> > > 
> > > Doesn't this mean there may be sysfs files which are accessible 
> > > for an unregistered netdevice?  
> > 
> > It would mean having accessible sysfs files for a device in the
> > NETREG_UNREGISTERING state; NETREG_UNREGISTERED still comes after
> > device_del. It's a small difference but still important, I think.
> > 
> > You raise a good point. Yes, that would mean accessing attributes of net
> > devices being unregistered, meaning accessing or modifying unused or
> > obsolete parameters and data (it shouldn't be garbage data though).
> > Unlisting those sysfs files without removing them would be better here,
> > to not expose files when the device is being unregistered while still
> > allowing pending operations to complete. I don't know if that is doable
> > in sysfs.
> 
> I wonder. Do we somehow remove the queue objects without waiting or are
> those also waited on when we remove the device? 'Cause XPS is the part
> that jumps out to me - we reset XPS after netdev_unregister_kobject().
> Does it mean user can re-instate XPS settings after we thought we
> already reset them?

This should be possible yes (and not really wanted).

> Well, it's a little wobbly but I think the direction is sane.
> It wouldn't feel super clean to add
> 
>         if (dev->state != NETREG_REGISTERED)
>                 goto out;
> 
> to the sysfs handlers but maybe it's better than leaving potential
> traps for people who are not aware of the intricacies later? Not sure.

Agreed, that doesn't feel super clean, but would be quite nice to have
for users (and e.g. would also help in the XPS case). Having a wrapper
should be possible, to minimize the impact and make it a bit better.

> > (While I did ran stress tests reading/writing attributes while
> > unregistering devices, I think I missed an issue with the
> > netdev_queue_default attributes; which hopefully can be fixed — if the
> > whole idea is deemed acceptable).

I had a quick look about queue attributes, their removal should also be
done in run_todo (that's easy). However the queues can be updated in
flight (while holding the rtnl lock) and the error paths[1][2] do drain
sysfs files (in kobject_put).

We can't release the rtnl lock here. It should be possible to delay this
outside the rtnl lock (in the global workqueue) but as the kobject are
embedded in the queues, we might need to have them live outside to allow
async releases while a net device (and ->_rx/->_tx) is being freed[3].
That adds to the complexity...

[1] https://elixir.bootlin.com/linux/latest/source/net/core/net-sysfs.c#L1662
[2] https://elixir.bootlin.com/linux/latest/source/net/core/net-sysfs.c#L1067
[3] Or having a dedicated workqueue and draining it.

> > > Isn't the point of having device_del() under rtnl_lock() to make sure
> > > we sysfs handlers can't run on dead devices?  
> > 
> > Hard to say what was the initial point, there is a lot of history here
> > :) I'm not sure it was done because of a particular reason; IMHO it just
> > made sense to make this simple without having a good reason not to do
> > so. And it helped with the naming collision detection.
> 
> FWIW the other two pieces of feedback I have is try to avoid the
> synchronize_net() in patch 7

I wasn't too happy in adding a call to this. However the node name list
is rcu protected and to make sure all CPUs see the removal before
freeing the node name a call to synchronize_net (synchronize_rcu) is
needed. That being said I think we can just call kfree_rcu instead of
netdev_name_node_free (kfree) here.

> add a new helper for the name checking, which would return bool. The
> callers don't have any business getting the struct.

Good idea. Also I think the replacement of __dev_get_by_name by a new
wrapper might be good even outside of this series (in case the series is
delayed / reworked heavily / etc).

Thanks!
Antoine

  reply	other threads:[~2021-09-29 17:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 12:54 [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 1/9] net-sysfs: try not to restart the syscall if it will fail eventually Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 2/9] net: split unlisting the net device from unlisting its node name Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 3/9] net: export netdev_name_node_lookup Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 4/9] bonding: use the correct function to check for netdev name collision Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 5/9] ppp: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 6/9] net: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 7/9] net: delay the removal of the name nodes until run_todo Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 8/9] net: delay device_del " Antoine Tenart
2021-09-29  0:02   ` Jakub Kicinski
2021-09-29  8:26     ` Antoine Tenart
2021-09-29 13:31       ` Jakub Kicinski
2021-09-29 17:31         ` Antoine Tenart [this message]
2021-10-29  9:04           ` Antoine Tenart
2021-10-05 15:21         ` Antoine Tenart
2021-10-05 18:34           ` Jakub Kicinski
2021-09-28 12:55 ` [RFC PATCH net-next 9/9] net-sysfs: remove the use of rtnl_trylock/restart_syscall Antoine Tenart
2021-10-06  6:45   ` Michal Hocko
2021-10-06  8:03     ` Antoine Tenart
2021-10-06  8:55       ` Michal Hocko
2021-10-06  6:37 ` [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Michal Hocko
2021-10-06  7:59   ` Antoine Tenart
2021-10-06  8:35     ` Michal Hocko
2021-10-29 14:33 ` Antoine Tenart
2021-10-29 15:45   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163293671647.3047.7240482794798716272@kwain \
    --to=atenart@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).