All of lore.kernel.org
 help / color / mirror / Atom feed
From: Antoine Tenart <atenart@kernel.org>
To: Jakub Kicinski <kuba@kernel.org>
Cc: davem@davemloft.net, pabeni@redhat.com,
	gregkh@linuxfoundation.org, ebiederm@xmission.com,
	stephen@networkplumber.org, herbert@gondor.apana.org.au,
	juri.lelli@redhat.com, netdev@vger.kernel.org, mhocko@suse.com
Subject: Re: [RFC PATCH net-next 8/9] net: delay device_del until run_todo
Date: Fri, 29 Oct 2021 11:04:26 +0200	[thread overview]
Message-ID: <163549826664.3523.4140191764737040064@kwain> (raw)
In-Reply-To: <163293671647.3047.7240482794798716272@kwain>

Quoting Antoine Tenart (2021-09-29 19:31:56)
> Quoting Jakub Kicinski (2021-09-29 15:31:26)
> > On Wed, 29 Sep 2021 10:26:35 +0200 Antoine Tenart wrote:
> 
> > > (While I did ran stress tests reading/writing attributes while
> > > unregistering devices, I think I missed an issue with the
> > > netdev_queue_default attributes; which hopefully can be fixed — if the
> > > whole idea is deemed acceptable).
> 
> I had a quick look about queue attributes, their removal should also be
> done in run_todo (that's easy). However the queues can be updated in
> flight (while holding the rtnl lock) and the error paths[1][2] do drain
> sysfs files (in kobject_put).
> 
> We can't release the rtnl lock here. It should be possible to delay this
> outside the rtnl lock (in the global workqueue) but as the kobject are
> embedded in the queues, we might need to have them live outside to allow
> async releases while a net device (and ->_rx/->_tx) is being freed[3].
> That adds to the complexity...
> 
> [1] https://elixir.bootlin.com/linux/latest/source/net/core/net-sysfs.c#L1662
> [2] https://elixir.bootlin.com/linux/latest/source/net/core/net-sysfs.c#L1067
> [3] Or having a dedicated workqueue and draining it.

I got back to this and while all other suggestions where easy enough to
get right, handling the queue sysfs files was not... The explanation is
the same for Tx and Rx queues.

Sysfs queue files are special: in addition to being created at device
registration and removed at unregistration, they also can be
reconfigured (added & removed) in-flight. This is done under the rtnl
lock. So for those sysfs files the trylock/restart construction also
helps in not hitting a deadlock while removing queues in-flight.

To make this work here, I had to delay the queue removal outside the
rtnl lock. As the queues are statically allocated in net_device->_tx, I
made them dynamically allocated to allow delaying their removal outside
the rtnl lock (in a workqueue for example).

This worked for allowing the removal of queues not to hit the ABBA
deadlock. (Extra logic to drain the queues before device removal might
be needed too). But this introduced an issue as naming collision between
queues was now possible (if we tried registering new queues while old
ones weren't unregistered yet). This is because the unregistration was
delayed, and for this to work the reconfiguration of queues should be
atomic under the rtnl lock; which is exactly what the changes to not hit
the ABBA deadlock requires... There are contradicting goals here.

This is not really fixable IMHO. First we would need to add a naming
collision logic for queues only, but then we don't have the same
two-step unregistration logic as we have for net devices. And failing on
queues reconfigurations for this doesn't seem acceptable to me (this has
a lot of implications, many "users" can request queues registration &
unregistration). Plus the complexity starts to be quite noticeable,
which doesn't help maintenance.

The above looks like a dead end to me. I tried several approaches to
better handle the queues in sysfs, but couldn't find a solution not
hitting an issue one way or another.

I have however a few other ideas, that may or may not be acceptable.
I'll start a dedicated answer to this thread to discuss those.

Thanks,
Antoine

  reply	other threads:[~2021-10-29  9:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 12:54 [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 1/9] net-sysfs: try not to restart the syscall if it will fail eventually Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 2/9] net: split unlisting the net device from unlisting its node name Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 3/9] net: export netdev_name_node_lookup Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 4/9] bonding: use the correct function to check for netdev name collision Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 5/9] ppp: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 6/9] net: " Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 7/9] net: delay the removal of the name nodes until run_todo Antoine Tenart
2021-09-28 12:54 ` [RFC PATCH net-next 8/9] net: delay device_del " Antoine Tenart
2021-09-29  0:02   ` Jakub Kicinski
2021-09-29  8:26     ` Antoine Tenart
2021-09-29 13:31       ` Jakub Kicinski
2021-09-29 17:31         ` Antoine Tenart
2021-10-29  9:04           ` Antoine Tenart [this message]
2021-10-05 15:21         ` Antoine Tenart
2021-10-05 18:34           ` Jakub Kicinski
2021-09-28 12:55 ` [RFC PATCH net-next 9/9] net-sysfs: remove the use of rtnl_trylock/restart_syscall Antoine Tenart
2021-10-06  6:45   ` Michal Hocko
2021-10-06  8:03     ` Antoine Tenart
2021-10-06  8:55       ` Michal Hocko
2021-10-06  6:37 ` [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access Michal Hocko
2021-10-06  7:59   ` Antoine Tenart
2021-10-06  8:35     ` Michal Hocko
2021-10-29 14:33 ` Antoine Tenart
2021-10-29 15:45   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=163549826664.3523.4140191764737040064@kwain \
    --to=atenart@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.