All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, edumazet@google.com, pabeni@redhat.com,
	johannes@sipsolutions.net, pablo@netfilter.org,
	laforge@gnumonks.org, Jason@zx2c4.com, jgg@nvidia.com,
	leonro@nvidia.com, Jakub Kicinski <kuba@kernel.org>
Subject: [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor
Date: Wed, 11 May 2022 12:12:18 -0700	[thread overview]
Message-ID: <20220511191218.1402380-1-kuba@kernel.org> (raw)

Old API
-------
Our current API includes .ndo_init, .ndo_uninit and .priv_destructor.
First two are part of netdev_ops the last one is a member of netdevice
and can be overwritten at will by the drivers.

The semantics are that .ndo_uninit releases anything that
can be freed once the device is unlisted, while .priv_destructor
should release any resources which may be accessed outside
of rtnl_lock, only with a reference on the netdev held.

BTW as far as I can tell there is no strong reason for .ndo_init
to exist. It gets called early during registration after
netdev's name gets filled in, and none of the users I checked
care about the name. They could have as well run the code
they have in .ndo_init before calling register_netdevice().

Problem
-------
.priv_destructor gets called if register_netdevice() fails
after calling .ndo_init. This takes many by surprise. We recommend
to move init into .ndo_init, but .ndo_init intuitively pairs
with .ndo_uninit, not .priv_destructor.

A common workaround is to set .priv_destructor after
register_netdevice() succeeds. This works fine in practice but
is not always correct. Theoretically something may nack the

  call_netdevice_notifiers(NETDEV_REGISTER, dev)

after an entity earlier on the notifier chain has taken a ref
on the netdev.

Also the fact that .priv_destructor is a member of netdevice
is slightly suboptimal, security people like when callbacks
live in read-only memory.

New API
-------
We want an intuitive API, which I think should mean symmetric
ndo callbacks only. There is no point in having two steps at
init time so this patch renames .ndo_init as .ndo_alloc_and_init.
.ndo_uninit remains untouched, while .priv_destructor gets
replaced by .ndo_release. (Logically speaking, obviously, the
old callbacks will get removed once current users are migrated.)

That's best I can come up. Using .ndo_free as a name may seem
like a better option but that may make people think it's called
from free_netdev().

.ndo_alloc_and_init grows an @arg parameter because ovs and
gtp want to pass parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst | 72 ++++++++++++++-----------
 include/linux/netdevice.h               | 40 ++++++++++----
 net/core/dev.c                          | 18 +++++--
 3 files changed, 84 insertions(+), 46 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 9e4cccb90b87..156417d8ca1f 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -89,7 +89,8 @@ Device management under RTNL
 Registering struct net_device while in context which already holds
 the ``rtnl_lock`` requires extra care. In those scenarios most drivers
 will want to make use of struct net_device's ``needs_free_netdev``
-and ``priv_destructor`` members for freeing of state.
+and ``.ndo_alloc_and_init`` / ``.ndo_uninit`` / ``.ndo_release`` members
+for freeing of the state.
 
 Example flow of netdev handling under ``rtnl_lock``:
 
@@ -100,12 +101,36 @@ and ``priv_destructor`` members for freeing of state.
     dev->needs_free_netdev = true;
   }
 
-  static void my_destructor(struct net_device *dev)
+  static int my_ndo_alloc_and_init(struct net_device *dev, void *arg)
   {
+    struct my_device_priv *priv = netdev_priv(dev);
+    int err;
+
+    priv->obj = some_obj_create();
+    if (!priv->obj)
+      return -ENOMEM;
+
+    err = some_init(priv);
+    if (err)
+      goto err_some_free;
+
+    return 0;
+
+  err_some_uninit:
     some_obj_destroy(priv->obj);
+    return err;
+  }
+
+  static void my_ndo_uninit(struct net_device *dev)
+  {
     some_uninit(priv);
   }
 
+  static void my_ndo_release(struct net_device *dev)
+  {
+    some_obj_destroy(priv->obj);
+  }
+
   int create_link()
   {
     struct my_device_priv *priv;
@@ -118,19 +143,6 @@ and ``priv_destructor`` members for freeing of state.
       return -ENOMEM;
     priv = netdev_priv(dev);
 
-    /* Implicit constructor */
-    err = some_init(priv);
-    if (err)
-      goto err_free_dev;
-
-    priv->obj = some_obj_create();
-    if (!priv->obj) {
-      err = -ENOMEM;
-      goto err_some_uninit;
-    }
-    /* End of constructor, set the destructor: */
-    dev->priv_destructor = my_destructor;
-
     err = register_netdevice(dev);
     if (err)
       /* register_netdevice() calls destructor on failure */
@@ -149,13 +161,9 @@ and ``priv_destructor`` members for freeing of state.
     return err;
   }
 
-If struct net_device.priv_destructor is set it will be called by the core
-some time after unregister_netdevice(), it will also be called if
-register_netdevice() fails. The callback may be invoked with or without
-``rtnl_lock`` held.
-
-There is no explicit constructor callback, driver "constructs" the private
-netdev state after allocating it and before registration.
+``.ndo_alloc_and_init`` prepares and allocates all the state. ``.ndo_uninit``
+cleans up what can be cleaned up while not all references are gone,
+``.ndo_release`` cleans up what needs to wait for references to be released.
 
 Setting struct net_device.needs_free_netdev makes core call free_netdevice()
 automatically after unregister_netdevice() when all references to the device
@@ -171,16 +179,20 @@ will defer some of the processing until ``rtnl_lock`` is released.
 Devices spawned from struct rtnl_link_ops should never free the
 struct net_device directly.
 
-.ndo_init and .ndo_uninit
-~~~~~~~~~~~~~~~~~~~~~~~~~
+.ndo_alloc_and_init, .ndo_uninit and .ndo_release
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``.ndo_alloc_and_init`` and ``.ndo_uninit`` callbacks are called during
+net_device registration and de-registration, under ``rtnl_lock``. Drivers can
+use those e.g. when parts of their init process need to run under ``rtnl_lock``.
 
-``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
-registration and de-registration, under ``rtnl_lock``. Drivers can use
-those e.g. when parts of their init process need to run under ``rtnl_lock``.
+``.ndo_alloc_and_init`` runs before device is visible in the system,
+``.ndo_uninit`` runs during de-registering after device is closed but
+other subsystems may still have outstanding references to the netdevice.
 
-``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
-runs during de-registering after device is closed but other subsystems
-may still have outstanding references to the netdevice.
+For freeing state which needs to remain in place until all netdev references
+had been released use ``.ndo_release``. Note, however, that ``.ndo_release``
+does not give guarantees on whether the ``rtnl_lock`` is held.
 
 MTU
 ===
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 050aa4d95a69..97ed639a04f0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1033,16 +1033,6 @@ struct netdev_net_notifier {
  * The following hooks can be defined; unless noted otherwise, they are
  * optional and can be filled with a null pointer.
  *
- * int (*ndo_init)(struct net_device *dev);
- *     This function is called once when a network device is registered.
- *     The network device can use this for any late stage initialization
- *     or semantic validation. It can fail with an error code which will
- *     be propagated back to register_netdev.
- *
- * void (*ndo_uninit)(struct net_device *dev);
- *     This function is called when device is unregistered or when registration
- *     fails. It is not called if init fails.
- *
  * int (*ndo_open)(struct net_device *dev);
  *     This function is called when a network device transitions to the up
  *     state.
@@ -1365,8 +1355,32 @@ struct netdev_net_notifier {
  *	free running cycle counter.
  */
 struct net_device_ops {
+	/*
+	 * @ndo_init:
+	 * @ndo_alloc_and_init:
+	 *	Called once when a network device is registered.
+	 *	The network device can use this for any late stage
+	 *	initialization or semantic validation. It can fail with an error
+	 *	code which will be propagated back to register_netdev().
+	 *	Context: sleeping, RTNL lock held.
+	 * @ndo_uninit:
+	 *	Called when device is unregistered or when registration fails
+	 *	after calling @ndo_alloc_and_init. Outstanding references
+	 *	to the netdev may still exist.
+	 *	Context: sleeping, RTNL lock held.
+	 * @ndo_release:
+	 *	Called when all netdev references had been released when
+	 *	device is unregistered or when registration fails after calling
+	 *	@ndo_init. Use this callback to free device-specific state
+	 *	which may be accessed without the RTNL lock, only a reference
+	 *	on the netdev.
+	 *	Context: sleeping, RTNL *may* be held!
+	 */
 	int			(*ndo_init)(struct net_device *dev);
+	int			(*ndo_alloc_and_init)(struct net_device *dev,
+						      void *arg);
 	void			(*ndo_uninit)(struct net_device *dev);
+	void			(*ndo_release)(struct net_device *dev);
 	int			(*ndo_open)(struct net_device *dev);
 	int			(*ndo_stop)(struct net_device *dev);
 	netdev_tx_t		(*ndo_start_xmit)(struct sk_buff *skb,
@@ -3001,7 +3015,11 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
 	return ret;
 }
 
-int register_netdevice(struct net_device *dev);
+int register_netdevice_arg(struct net_device *dev, void *arg);
+static inline int register_netdevice(struct net_device *dev)
+{
+	return register_netdevice_arg(dev, NULL);
+}
 void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
 void unregister_netdevice_many(struct list_head *head);
 static inline void unregister_netdevice(struct net_device *dev)
diff --git a/net/core/dev.c b/net/core/dev.c
index 91b7e7784da9..c39a623267e0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9927,8 +9927,9 @@ void netif_tx_stop_all_queues(struct net_device *dev)
 EXPORT_SYMBOL(netif_tx_stop_all_queues);
 
 /**
- *	register_netdevice	- register a network device
+ *	register_netdevice_arg() - register a network device
  *	@dev: device to register
+ *	@arg: optional argument for .ndo_alloc_and_init
  *
  *	Take a completed network device structure and add it to the kernel
  *	interfaces. A %NETDEV_REGISTER message is sent to the netdev notifier
@@ -9943,7 +9944,7 @@ EXPORT_SYMBOL(netif_tx_stop_all_queues);
  *	will not get the same name.
  */
 
-int register_netdevice(struct net_device *dev)
+int register_netdevice_arg(struct net_device *dev, void *arg)
 {
 	int ret;
 	struct net *net = dev_net(dev);
@@ -9976,8 +9977,11 @@ int register_netdevice(struct net_device *dev)
 		goto out;
 
 	/* Init, if this function is available */
-	if (dev->netdev_ops->ndo_init) {
-		ret = dev->netdev_ops->ndo_init(dev);
+	if (dev->netdev_ops->ndo_init || dev->netdev_ops->ndo_alloc_and_init) {
+		if (dev->netdev_ops->ndo_init)
+			ret = dev->netdev_ops->ndo_init(dev);
+		else if (dev->netdev_ops->ndo_alloc_and_init)
+			ret = dev->netdev_ops->ndo_alloc_and_init(dev, arg);
 		if (ret) {
 			if (ret > 0)
 				ret = -EIO;
@@ -10102,13 +10106,15 @@ int register_netdevice(struct net_device *dev)
 err_uninit:
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
+	if (dev->netdev_ops->ndo_release)
+		dev->netdev_ops->ndo_release(dev);
 	if (dev->priv_destructor)
 		dev->priv_destructor(dev);
 err_free_name:
 	netdev_name_node_free(dev->name_node);
 	goto out;
 }
-EXPORT_SYMBOL(register_netdevice);
+EXPORT_SYMBOL(register_netdevice_arg);
 
 /**
  *	init_dummy_netdev	- init a dummy network device for NAPI
@@ -10351,6 +10357,8 @@ void netdev_run_todo(void)
 #if IS_ENABLED(CONFIG_DECNET)
 		WARN_ON(dev->dn_ptr);
 #endif
+		if (dev->netdev_ops->ndo_release)
+			dev->netdev_ops->ndo_release(dev);
 		if (dev->priv_destructor)
 			dev->priv_destructor(dev);
 		if (dev->needs_free_netdev)
-- 
2.34.3


             reply	other threads:[~2022-05-11 19:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11 19:12 Jakub Kicinski [this message]
2022-05-11 21:11 ` [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor Jakub Kicinski
2022-05-11 23:43 ` Jason Gunthorpe
2022-05-13  5:47   ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220511191218.1402380-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jgg@nvidia.com \
    --cc=johannes@sipsolutions.net \
    --cc=laforge@gnumonks.org \
    --cc=leonro@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.