All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor
@ 2022-05-11 19:12 Jakub Kicinski
  2022-05-11 21:11 ` Jakub Kicinski
  2022-05-11 23:43 ` Jason Gunthorpe
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Kicinski @ 2022-05-11 19:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, johannes, pablo, laforge, Jason, jgg,
	leonro, Jakub Kicinski

Old API
-------
Our current API includes .ndo_init, .ndo_uninit and .priv_destructor.
First two are part of netdev_ops the last one is a member of netdevice
and can be overwritten at will by the drivers.

The semantics are that .ndo_uninit releases anything that
can be freed once the device is unlisted, while .priv_destructor
should release any resources which may be accessed outside
of rtnl_lock, only with a reference on the netdev held.

BTW as far as I can tell there is no strong reason for .ndo_init
to exist. It gets called early during registration after
netdev's name gets filled in, and none of the users I checked
care about the name. They could have as well run the code
they have in .ndo_init before calling register_netdevice().

Problem
-------
.priv_destructor gets called if register_netdevice() fails
after calling .ndo_init. This takes many by surprise. We recommend
to move init into .ndo_init, but .ndo_init intuitively pairs
with .ndo_uninit, not .priv_destructor.

A common workaround is to set .priv_destructor after
register_netdevice() succeeds. This works fine in practice but
is not always correct. Theoretically something may nack the

  call_netdevice_notifiers(NETDEV_REGISTER, dev)

after an entity earlier on the notifier chain has taken a ref
on the netdev.

Also the fact that .priv_destructor is a member of netdevice
is slightly suboptimal, security people like when callbacks
live in read-only memory.

New API
-------
We want an intuitive API, which I think should mean symmetric
ndo callbacks only. There is no point in having two steps at
init time so this patch renames .ndo_init as .ndo_alloc_and_init.
.ndo_uninit remains untouched, while .priv_destructor gets
replaced by .ndo_release. (Logically speaking, obviously, the
old callbacks will get removed once current users are migrated.)

That's best I can come up. Using .ndo_free as a name may seem
like a better option but that may make people think it's called
from free_netdev().

.ndo_alloc_and_init grows an @arg parameter because ovs and
gtp want to pass parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/netdevices.rst | 72 ++++++++++++++-----------
 include/linux/netdevice.h               | 40 ++++++++++----
 net/core/dev.c                          | 18 +++++--
 3 files changed, 84 insertions(+), 46 deletions(-)

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 9e4cccb90b87..156417d8ca1f 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -89,7 +89,8 @@ Device management under RTNL
 Registering struct net_device while in context which already holds
 the ``rtnl_lock`` requires extra care. In those scenarios most drivers
 will want to make use of struct net_device's ``needs_free_netdev``
-and ``priv_destructor`` members for freeing of state.
+and ``.ndo_alloc_and_init`` / ``.ndo_uninit`` / ``.ndo_release`` members
+for freeing of the state.
 
 Example flow of netdev handling under ``rtnl_lock``:
 
@@ -100,12 +101,36 @@ and ``priv_destructor`` members for freeing of state.
     dev->needs_free_netdev = true;
   }
 
-  static void my_destructor(struct net_device *dev)
+  static int my_ndo_alloc_and_init(struct net_device *dev, void *arg)
   {
+    struct my_device_priv *priv = netdev_priv(dev);
+    int err;
+
+    priv->obj = some_obj_create();
+    if (!priv->obj)
+      return -ENOMEM;
+
+    err = some_init(priv);
+    if (err)
+      goto err_some_free;
+
+    return 0;
+
+  err_some_uninit:
     some_obj_destroy(priv->obj);
+    return err;
+  }
+
+  static void my_ndo_uninit(struct net_device *dev)
+  {
     some_uninit(priv);
   }
 
+  static void my_ndo_release(struct net_device *dev)
+  {
+    some_obj_destroy(priv->obj);
+  }
+
   int create_link()
   {
     struct my_device_priv *priv;
@@ -118,19 +143,6 @@ and ``priv_destructor`` members for freeing of state.
       return -ENOMEM;
     priv = netdev_priv(dev);
 
-    /* Implicit constructor */
-    err = some_init(priv);
-    if (err)
-      goto err_free_dev;
-
-    priv->obj = some_obj_create();
-    if (!priv->obj) {
-      err = -ENOMEM;
-      goto err_some_uninit;
-    }
-    /* End of constructor, set the destructor: */
-    dev->priv_destructor = my_destructor;
-
     err = register_netdevice(dev);
     if (err)
       /* register_netdevice() calls destructor on failure */
@@ -149,13 +161,9 @@ and ``priv_destructor`` members for freeing of state.
     return err;
   }
 
-If struct net_device.priv_destructor is set it will be called by the core
-some time after unregister_netdevice(), it will also be called if
-register_netdevice() fails. The callback may be invoked with or without
-``rtnl_lock`` held.
-
-There is no explicit constructor callback, driver "constructs" the private
-netdev state after allocating it and before registration.
+``.ndo_alloc_and_init`` prepares and allocates all the state. ``.ndo_uninit``
+cleans up what can be cleaned up while not all references are gone,
+``.ndo_release`` cleans up what needs to wait for references to be released.
 
 Setting struct net_device.needs_free_netdev makes core call free_netdevice()
 automatically after unregister_netdevice() when all references to the device
@@ -171,16 +179,20 @@ will defer some of the processing until ``rtnl_lock`` is released.
 Devices spawned from struct rtnl_link_ops should never free the
 struct net_device directly.
 
-.ndo_init and .ndo_uninit
-~~~~~~~~~~~~~~~~~~~~~~~~~
+.ndo_alloc_and_init, .ndo_uninit and .ndo_release
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``.ndo_alloc_and_init`` and ``.ndo_uninit`` callbacks are called during
+net_device registration and de-registration, under ``rtnl_lock``. Drivers can
+use those e.g. when parts of their init process need to run under ``rtnl_lock``.
 
-``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
-registration and de-registration, under ``rtnl_lock``. Drivers can use
-those e.g. when parts of their init process need to run under ``rtnl_lock``.
+``.ndo_alloc_and_init`` runs before device is visible in the system,
+``.ndo_uninit`` runs during de-registering after device is closed but
+other subsystems may still have outstanding references to the netdevice.
 
-``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
-runs during de-registering after device is closed but other subsystems
-may still have outstanding references to the netdevice.
+For freeing state which needs to remain in place until all netdev references
+had been released use ``.ndo_release``. Note, however, that ``.ndo_release``
+does not give guarantees on whether the ``rtnl_lock`` is held.
 
 MTU
 ===
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 050aa4d95a69..97ed639a04f0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1033,16 +1033,6 @@ struct netdev_net_notifier {
  * The following hooks can be defined; unless noted otherwise, they are
  * optional and can be filled with a null pointer.
  *
- * int (*ndo_init)(struct net_device *dev);
- *     This function is called once when a network device is registered.
- *     The network device can use this for any late stage initialization
- *     or semantic validation. It can fail with an error code which will
- *     be propagated back to register_netdev.
- *
- * void (*ndo_uninit)(struct net_device *dev);
- *     This function is called when device is unregistered or when registration
- *     fails. It is not called if init fails.
- *
  * int (*ndo_open)(struct net_device *dev);
  *     This function is called when a network device transitions to the up
  *     state.
@@ -1365,8 +1355,32 @@ struct netdev_net_notifier {
  *	free running cycle counter.
  */
 struct net_device_ops {
+	/*
+	 * @ndo_init:
+	 * @ndo_alloc_and_init:
+	 *	Called once when a network device is registered.
+	 *	The network device can use this for any late stage
+	 *	initialization or semantic validation. It can fail with an error
+	 *	code which will be propagated back to register_netdev().
+	 *	Context: sleeping, RTNL lock held.
+	 * @ndo_uninit:
+	 *	Called when device is unregistered or when registration fails
+	 *	after calling @ndo_alloc_and_init. Outstanding references
+	 *	to the netdev may still exist.
+	 *	Context: sleeping, RTNL lock held.
+	 * @ndo_release:
+	 *	Called when all netdev references had been released when
+	 *	device is unregistered or when registration fails after calling
+	 *	@ndo_init. Use this callback to free device-specific state
+	 *	which may be accessed without the RTNL lock, only a reference
+	 *	on the netdev.
+	 *	Context: sleeping, RTNL *may* be held!
+	 */
 	int			(*ndo_init)(struct net_device *dev);
+	int			(*ndo_alloc_and_init)(struct net_device *dev,
+						      void *arg);
 	void			(*ndo_uninit)(struct net_device *dev);
+	void			(*ndo_release)(struct net_device *dev);
 	int			(*ndo_open)(struct net_device *dev);
 	int			(*ndo_stop)(struct net_device *dev);
 	netdev_tx_t		(*ndo_start_xmit)(struct sk_buff *skb,
@@ -3001,7 +3015,11 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
 	return ret;
 }
 
-int register_netdevice(struct net_device *dev);
+int register_netdevice_arg(struct net_device *dev, void *arg);
+static inline int register_netdevice(struct net_device *dev)
+{
+	return register_netdevice_arg(dev, NULL);
+}
 void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
 void unregister_netdevice_many(struct list_head *head);
 static inline void unregister_netdevice(struct net_device *dev)
diff --git a/net/core/dev.c b/net/core/dev.c
index 91b7e7784da9..c39a623267e0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9927,8 +9927,9 @@ void netif_tx_stop_all_queues(struct net_device *dev)
 EXPORT_SYMBOL(netif_tx_stop_all_queues);
 
 /**
- *	register_netdevice	- register a network device
+ *	register_netdevice_arg() - register a network device
  *	@dev: device to register
+ *	@arg: optional argument for .ndo_alloc_and_init
  *
  *	Take a completed network device structure and add it to the kernel
  *	interfaces. A %NETDEV_REGISTER message is sent to the netdev notifier
@@ -9943,7 +9944,7 @@ EXPORT_SYMBOL(netif_tx_stop_all_queues);
  *	will not get the same name.
  */
 
-int register_netdevice(struct net_device *dev)
+int register_netdevice_arg(struct net_device *dev, void *arg)
 {
 	int ret;
 	struct net *net = dev_net(dev);
@@ -9976,8 +9977,11 @@ int register_netdevice(struct net_device *dev)
 		goto out;
 
 	/* Init, if this function is available */
-	if (dev->netdev_ops->ndo_init) {
-		ret = dev->netdev_ops->ndo_init(dev);
+	if (dev->netdev_ops->ndo_init || dev->netdev_ops->ndo_alloc_and_init) {
+		if (dev->netdev_ops->ndo_init)
+			ret = dev->netdev_ops->ndo_init(dev);
+		else if (dev->netdev_ops->ndo_alloc_and_init)
+			ret = dev->netdev_ops->ndo_alloc_and_init(dev, arg);
 		if (ret) {
 			if (ret > 0)
 				ret = -EIO;
@@ -10102,13 +10106,15 @@ int register_netdevice(struct net_device *dev)
 err_uninit:
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
+	if (dev->netdev_ops->ndo_release)
+		dev->netdev_ops->ndo_release(dev);
 	if (dev->priv_destructor)
 		dev->priv_destructor(dev);
 err_free_name:
 	netdev_name_node_free(dev->name_node);
 	goto out;
 }
-EXPORT_SYMBOL(register_netdevice);
+EXPORT_SYMBOL(register_netdevice_arg);
 
 /**
  *	init_dummy_netdev	- init a dummy network device for NAPI
@@ -10351,6 +10357,8 @@ void netdev_run_todo(void)
 #if IS_ENABLED(CONFIG_DECNET)
 		WARN_ON(dev->dn_ptr);
 #endif
+		if (dev->netdev_ops->ndo_release)
+			dev->netdev_ops->ndo_release(dev);
 		if (dev->priv_destructor)
 			dev->priv_destructor(dev);
 		if (dev->needs_free_netdev)
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor
  2022-05-11 19:12 [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor Jakub Kicinski
@ 2022-05-11 21:11 ` Jakub Kicinski
  2022-05-11 23:43 ` Jason Gunthorpe
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2022-05-11 21:11 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, johannes, pablo, laforge, Jason, jgg, leonro

On Wed, 11 May 2022 12:12:18 -0700 Jakub Kicinski wrote:
> That's best I can come up. Using .ndo_free as a name may seem
> like a better option but that may make people think it's called
> from free_netdev().

Maybe ndo_destroy is better 🤔

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor
  2022-05-11 19:12 [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor Jakub Kicinski
  2022-05-11 21:11 ` Jakub Kicinski
@ 2022-05-11 23:43 ` Jason Gunthorpe
  2022-05-13  5:47   ` Leon Romanovsky
  1 sibling, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2022-05-11 23:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, johannes, pablo, laforge, Jason, leonro

On Wed, May 11, 2022 at 12:12:18PM -0700, Jakub Kicinski wrote:
> Old API
> -------
> Our current API includes .ndo_init, .ndo_uninit and .priv_destructor.
> First two are part of netdev_ops the last one is a member of netdevice
> and can be overwritten at will by the drivers.

ipoib runs into this trouble, it would be nice to see it improved, but
I'm not seeing how this helps..

> BTW as far as I can tell there is no strong reason for .ndo_init
> to exist. It gets called early during registration after
> netdev's name gets filled in, and none of the users I checked
> care about the name. They could have as well run the code
> they have in .ndo_init before calling register_netdevice().

Well, exactly. This is sort of where ipoib ends up - and it is
complicated enough that moving everything into ndo_init isn't
something obviously doable - there are several ndo_inits and several
different driver flows involved here.

So, the proposal here seems to be to rename ndo_init but otherwise
keep the lifecycle model the same - and with a now const ops it is
basically hopeless to do anything that needs to be undone before
register_netdev()?

I would be happier if netdev was more like everything else and allowed
a clean alloc/free vs register/unregister pairing. The usual lifecycle
model cast to netdev terms would have the release function set around
alloc_netdev and always called once at free_netdev.

The caller should set the ops and release function after it has
completed initializing whatever its release will undo, similare to how
device_initialize()/put_device works.

IIRC priv_destructor is really only needed when needs_free_netdev is
used, and the real point is to reliably inject code before
free_netdev.

> A common workaround is to set .priv_destructor after
> register_netdevice() succeeds. This works fine in practice but
> is not always correct. Theoretically something may nack the

ipoib does this, but I don't think it has a problem because it does
the priv_destructor action manually on the error path after
register_netdev fails and then leaves priv_destructor NULL'd so that
the later queued unregister doesn't double call it.

> We want an intuitive API, which I think should mean symmetric
> ndo callbacks only. There is no point in having two steps at
> init time so this patch renames .ndo_init as .ndo_alloc_and_init.

What does it alloc? It doesn't alloc the priv, that was done
earlier. It seems like a more confusing name than ndo_init to me.

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor
  2022-05-11 23:43 ` Jason Gunthorpe
@ 2022-05-13  5:47   ` Leon Romanovsky
  0 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2022-05-13  5:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, johannes, pablo,
	laforge, Jason

On Wed, May 11, 2022 at 08:43:28PM -0300, Jason Gunthorpe wrote:
> On Wed, May 11, 2022 at 12:12:18PM -0700, Jakub Kicinski wrote:
> > Old API

<...>

> I would be happier if netdev was more like everything else and allowed
> a clean alloc/free vs register/unregister pairing. The usual lifecycle
> model cast to netdev terms would have the release function set around
> alloc_netdev and always called once at free_netdev.
> 
> The caller should set the ops and release function after it has
> completed initializing whatever its release will undo, similare to how
> device_initialize()/put_device works.

Exactly, it will give much clearer picture of netdev life cycle than it
is now.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-13  5:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 19:12 [RFC net-next] net: add ndo_alloc_and_init and ndo_release to replace priv_destructor Jakub Kicinski
2022-05-11 21:11 ` Jakub Kicinski
2022-05-11 23:43 ` Jason Gunthorpe
2022-05-13  5:47   ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.