All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
@ 2021-11-25 16:51 Lahav Schlesinger
  2021-11-28  7:33 ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: Lahav Schlesinger @ 2021-11-25 16:51 UTC (permalink / raw)
  To: netdev; +Cc: kuba, dsahern

Under large scale, some routers are required to support tens of thousands
of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
vrfs, etc).
At times such routers are required to delete massive amounts of devices
at once, such as when a factory reset is performed on the router (causing
a deletion of all devices), or when a configuration is restored after an
upgrade, or as a request from an operator.

Currently there are 2 means of deleting devices using Netlink:
1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
or by name using IFLA_IFNAME)
2. Delete all device that belong to a group (using IFLA_GROUP)

Deletion of devices one-by-one has poor performance on large scale of
devices compared to "group deletion":
After all device are handled, netdev_run_todo() is called which
calls rcu_barrier() to finish any outstanding RCU callbacks that were
registered during the deletion of the device, then wait until the
refcount of all the devices is 0, then perform final cleanups.

However, calling rcu_barrier() is a very costly operation, each call
taking in the order of 10s of milliseconds.

When deleting a large number of device one-by-one, rcu_barrier()
will be called for each device being deleted.
As an example, following benchmark deletes 10K loopback devices,
all of which are UP and with only IPv6 LLA being configured:

1. Deleting one-by-one using 1 thread : 243 seconds
2. Deleting one-by-one using 10 thread: 70 seconds
3. Deleting one-by-one using 50 thread: 54 seconds
4. Deleting all using "group deletion": 30 seconds

Note that even though the deletion logic takes place under the rtnl
lock, since the call to rcu_barrier() is outside the lock we gain
some improvements.

But, while "group deletion" is the fastest, it is not suited for
deleting large number of arbitrary devices which are unknown a head of
time. Furthermore, moving large number of devices to a group is also a
costly operation.

This patch adds support for passing an arbitrary list of ifindex of
devices to delete with a new IFLA_IFINDEX_LIST attribute.
This gives a more fine-grained control over which devices to delete,
while still resulting in rcu_barrier() being called only once.
Indeed, the timings of using this new API to delete 10K devices is
the same as using the existing "group" deletion.

The size constraints on the attribute means the API can delete at most
16382 devices in a single request.

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
---
v2 -> v3
 - Rename 'ifindex_list' to 'ifindices', and pass it as int*
 - Clamp 'ops' variable in second loop.

v1 -> v2
 - Unset 'len' of IFLA_IFINDEX_LIST in policy.
 - Use __dev_get_by_index() instead of n^2 loop.
 - Return -ENODEV if any ifindex is not present.
 - Saved devices in an array.
 - Fix formatting.

 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 50 ++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..f950bf6ed025 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 
+	IFLA_IFINDEX_LIST,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fd030e02f16d..49d1a3954a01 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
+	[IFLA_IFINDEX_LIST]	= { .type = NLA_BINARY },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -3050,6 +3051,52 @@ static int rtnl_group_dellink(const struct net *net, int group)
 	return 0;
 }
 
+static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
+{
+	const int num_devices = size / sizeof(int);
+	struct net_device **dev_list;
+	LIST_HEAD(list_kill);
+	int i, ret;
+
+	if (size <= 0 || size % sizeof(int))
+		return -EINVAL;
+
+	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
+	if (!dev_list)
+		return -ENOMEM;
+
+	for (i = 0; i < num_devices; i++) {
+		const struct rtnl_link_ops *ops;
+		struct net_device *dev;
+
+		ret = -ENODEV;
+		dev = __dev_get_by_index(net, ifindices[i]);
+		if (!dev)
+			goto out_free;
+
+		ret = -EOPNOTSUPP;
+		ops = dev->rtnl_link_ops;
+		if (!ops || !ops->dellink)
+			goto out_free;
+
+		dev_list[i] = dev;
+	}
+
+	for (i = 0; i < num_devices; i++) {
+		struct net_device *dev = dev_list[i];
+
+		dev->rtnl_link_ops->dellink(dev, &list_kill);
+	}
+
+	unregister_netdevice_many(&list_kill);
+
+	ret = 0;
+
+out_free:
+	kfree(dev_list);
+	return ret;
+}
+
 int rtnl_delete_link(struct net_device *dev)
 {
 	const struct rtnl_link_ops *ops;
@@ -3102,6 +3149,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 				   tb[IFLA_ALT_IFNAME], NULL);
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
+	else if (tb[IFLA_IFINDEX_LIST])
+		err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
+					nla_len(tb[IFLA_IFINDEX_LIST]));
 	else
 		goto out;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-25 16:51 [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
@ 2021-11-28  7:33 ` Leon Romanovsky
  2021-11-28 11:13   ` Lahav Schlesinger
  0 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-11-28  7:33 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: netdev, kuba, dsahern

On Thu, Nov 25, 2021 at 06:51:46PM +0200, Lahav Schlesinger wrote:
> Under large scale, some routers are required to support tens of thousands
> of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> vrfs, etc).
> At times such routers are required to delete massive amounts of devices
> at once, such as when a factory reset is performed on the router (causing
> a deletion of all devices), or when a configuration is restored after an
> upgrade, or as a request from an operator.
> 
> Currently there are 2 means of deleting devices using Netlink:
> 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> or by name using IFLA_IFNAME)
> 2. Delete all device that belong to a group (using IFLA_GROUP)
> 
> Deletion of devices one-by-one has poor performance on large scale of
> devices compared to "group deletion":
> After all device are handled, netdev_run_todo() is called which
> calls rcu_barrier() to finish any outstanding RCU callbacks that were
> registered during the deletion of the device, then wait until the
> refcount of all the devices is 0, then perform final cleanups.
> 
> However, calling rcu_barrier() is a very costly operation, each call
> taking in the order of 10s of milliseconds.
> 
> When deleting a large number of device one-by-one, rcu_barrier()
> will be called for each device being deleted.
> As an example, following benchmark deletes 10K loopback devices,
> all of which are UP and with only IPv6 LLA being configured:
> 
> 1. Deleting one-by-one using 1 thread : 243 seconds
> 2. Deleting one-by-one using 10 thread: 70 seconds
> 3. Deleting one-by-one using 50 thread: 54 seconds
> 4. Deleting all using "group deletion": 30 seconds
> 
> Note that even though the deletion logic takes place under the rtnl
> lock, since the call to rcu_barrier() is outside the lock we gain
> some improvements.
> 
> But, while "group deletion" is the fastest, it is not suited for
> deleting large number of arbitrary devices which are unknown a head of
> time. Furthermore, moving large number of devices to a group is also a
> costly operation.
> 
> This patch adds support for passing an arbitrary list of ifindex of
> devices to delete with a new IFLA_IFINDEX_LIST attribute.
> This gives a more fine-grained control over which devices to delete,
> while still resulting in rcu_barrier() being called only once.
> Indeed, the timings of using this new API to delete 10K devices is
> the same as using the existing "group" deletion.
> 
> The size constraints on the attribute means the API can delete at most
> 16382 devices in a single request.
> 
> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> ---
> v2 -> v3
>  - Rename 'ifindex_list' to 'ifindices', and pass it as int*
>  - Clamp 'ops' variable in second loop.
> 
> v1 -> v2
>  - Unset 'len' of IFLA_IFINDEX_LIST in policy.
>  - Use __dev_get_by_index() instead of n^2 loop.
>  - Return -ENODEV if any ifindex is not present.
>  - Saved devices in an array.
>  - Fix formatting.
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/core/rtnetlink.c         | 50 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index eebd3894fe89..f950bf6ed025 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -348,6 +348,7 @@ enum {
>  	IFLA_PARENT_DEV_NAME,
>  	IFLA_PARENT_DEV_BUS_NAME,
>  
> +	IFLA_IFINDEX_LIST,
>  	__IFLA_MAX
>  };
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index fd030e02f16d..49d1a3954a01 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
>  	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
>  	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
> +	[IFLA_IFINDEX_LIST]	= { .type = NLA_BINARY },
>  };
>  
>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -3050,6 +3051,52 @@ static int rtnl_group_dellink(const struct net *net, int group)
>  	return 0;
>  }
>  
> +static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
> +{
> +	const int num_devices = size / sizeof(int);
> +	struct net_device **dev_list;
> +	LIST_HEAD(list_kill);
> +	int i, ret;
> +
> +	if (size <= 0 || size % sizeof(int))
> +		return -EINVAL;
> +
> +	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> +	if (!dev_list)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_devices; i++) {
> +		const struct rtnl_link_ops *ops;
> +		struct net_device *dev;
> +
> +		ret = -ENODEV;
> +		dev = __dev_get_by_index(net, ifindices[i]);
> +		if (!dev)
> +			goto out_free;
> +
> +		ret = -EOPNOTSUPP;
> +		ops = dev->rtnl_link_ops;
> +		if (!ops || !ops->dellink)
> +			goto out_free;

I'm just curious, how does user know that specific device doesn't
have ->delink implementation? It is important to know because you
are failing whole batch deletion. At least for single delink, users
have chance to skip "failed" one and continue.

Thanks

> +
> +		dev_list[i] = dev;
> +	}
> +
> +	for (i = 0; i < num_devices; i++) {
> +		struct net_device *dev = dev_list[i];
> +
> +		dev->rtnl_link_ops->dellink(dev, &list_kill);
> +	}
> +
> +	unregister_netdevice_many(&list_kill);
> +
> +	ret = 0;
> +
> +out_free:
> +	kfree(dev_list);
> +	return ret;
> +}
> +
>  int rtnl_delete_link(struct net_device *dev)
>  {
>  	const struct rtnl_link_ops *ops;
> @@ -3102,6 +3149,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>  				   tb[IFLA_ALT_IFNAME], NULL);
>  	else if (tb[IFLA_GROUP])
>  		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> +	else if (tb[IFLA_IFINDEX_LIST])
> +		err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
> +					nla_len(tb[IFLA_IFINDEX_LIST]));
>  	else
>  		goto out;
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-28  7:33 ` Leon Romanovsky
@ 2021-11-28 11:13   ` Lahav Schlesinger
  2021-11-28 11:43     ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: Lahav Schlesinger @ 2021-11-28 11:13 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, kuba, dsahern

On Sun, Nov 28, 2021 at 09:33:01AM +0200, Leon Romanovsky wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On Thu, Nov 25, 2021 at 06:51:46PM +0200, Lahav Schlesinger wrote:
> > Under large scale, some routers are required to support tens of thousands
> > of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> > vrfs, etc).
> > At times such routers are required to delete massive amounts of devices
> > at once, such as when a factory reset is performed on the router (causing
> > a deletion of all devices), or when a configuration is restored after an
> > upgrade, or as a request from an operator.
> >
> > Currently there are 2 means of deleting devices using Netlink:
> > 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> > or by name using IFLA_IFNAME)
> > 2. Delete all device that belong to a group (using IFLA_GROUP)
> >
> > Deletion of devices one-by-one has poor performance on large scale of
> > devices compared to "group deletion":
> > After all device are handled, netdev_run_todo() is called which
> > calls rcu_barrier() to finish any outstanding RCU callbacks that were
> > registered during the deletion of the device, then wait until the
> > refcount of all the devices is 0, then perform final cleanups.
> >
> > However, calling rcu_barrier() is a very costly operation, each call
> > taking in the order of 10s of milliseconds.
> >
> > When deleting a large number of device one-by-one, rcu_barrier()
> > will be called for each device being deleted.
> > As an example, following benchmark deletes 10K loopback devices,
> > all of which are UP and with only IPv6 LLA being configured:
> >
> > 1. Deleting one-by-one using 1 thread : 243 seconds
> > 2. Deleting one-by-one using 10 thread: 70 seconds
> > 3. Deleting one-by-one using 50 thread: 54 seconds
> > 4. Deleting all using "group deletion": 30 seconds
> >
> > Note that even though the deletion logic takes place under the rtnl
> > lock, since the call to rcu_barrier() is outside the lock we gain
> > some improvements.
> >
> > But, while "group deletion" is the fastest, it is not suited for
> > deleting large number of arbitrary devices which are unknown a head of
> > time. Furthermore, moving large number of devices to a group is also a
> > costly operation.
> >
> > This patch adds support for passing an arbitrary list of ifindex of
> > devices to delete with a new IFLA_IFINDEX_LIST attribute.
> > This gives a more fine-grained control over which devices to delete,
> > while still resulting in rcu_barrier() being called only once.
> > Indeed, the timings of using this new API to delete 10K devices is
> > the same as using the existing "group" deletion.
> >
> > The size constraints on the attribute means the API can delete at most
> > 16382 devices in a single request.
> >
> > Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> > ---
> > v2 -> v3
> >  - Rename 'ifindex_list' to 'ifindices', and pass it as int*
> >  - Clamp 'ops' variable in second loop.
> >
> > v1 -> v2
> >  - Unset 'len' of IFLA_IFINDEX_LIST in policy.
> >  - Use __dev_get_by_index() instead of n^2 loop.
> >  - Return -ENODEV if any ifindex is not present.
> >  - Saved devices in an array.
> >  - Fix formatting.
> >
> >  include/uapi/linux/if_link.h |  1 +
> >  net/core/rtnetlink.c         | 50 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 51 insertions(+)
> >
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index eebd3894fe89..f950bf6ed025 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -348,6 +348,7 @@ enum {
> >       IFLA_PARENT_DEV_NAME,
> >       IFLA_PARENT_DEV_BUS_NAME,
> >
> > +     IFLA_IFINDEX_LIST,
> >       __IFLA_MAX
> >  };
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index fd030e02f16d..49d1a3954a01 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > +     [IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY },
> >  };
> >
> >  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > @@ -3050,6 +3051,52 @@ static int rtnl_group_dellink(const struct net *net, int group)
> >       return 0;
> >  }
> >
> > +static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
> > +{
> > +     const int num_devices = size / sizeof(int);
> > +     struct net_device **dev_list;
> > +     LIST_HEAD(list_kill);
> > +     int i, ret;
> > +
> > +     if (size <= 0 || size % sizeof(int))
> > +             return -EINVAL;
> > +
> > +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> > +     if (!dev_list)
> > +             return -ENOMEM;
> > +
> > +     for (i = 0; i < num_devices; i++) {
> > +             const struct rtnl_link_ops *ops;
> > +             struct net_device *dev;
> > +
> > +             ret = -ENODEV;
> > +             dev = __dev_get_by_index(net, ifindices[i]);
> > +             if (!dev)
> > +                     goto out_free;
> > +
> > +             ret = -EOPNOTSUPP;
> > +             ops = dev->rtnl_link_ops;
> > +             if (!ops || !ops->dellink)
> > +                     goto out_free;
>
> I'm just curious, how does user know that specific device doesn't
> have ->delink implementation? It is important to know because you
> are failing whole batch deletion. At least for single delink, users
> have chance to skip "failed" one and continue.
>
> Thanks

Hi Leon, I don't see any immediate way users can get this information.
I do think that failing the whole request is better than silently
ignoring such devices.

Perhaps an alternative is to return the unsupported device's name in an
extack? To make NL_SET_ERR_MSG() support string formatting this will
require changing netlink_ext_ack::_msg to be an array though (skimming
over the calls to NL_SET_ERR_MSG(), a buffer of size say 128 should be
large enough).

>
> > +
> > +             dev_list[i] = dev;
> > +     }
> > +
> > +     for (i = 0; i < num_devices; i++) {
> > +             struct net_device *dev = dev_list[i];
> > +
> > +             dev->rtnl_link_ops->dellink(dev, &list_kill);
> > +     }
> > +
> > +     unregister_netdevice_many(&list_kill);
> > +
> > +     ret = 0;
> > +
> > +out_free:
> > +     kfree(dev_list);
> > +     return ret;
> > +}
> > +
> >  int rtnl_delete_link(struct net_device *dev)
> >  {
> >       const struct rtnl_link_ops *ops;
> > @@ -3102,6 +3149,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >                                  tb[IFLA_ALT_IFNAME], NULL);
> >       else if (tb[IFLA_GROUP])
> >               err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> > +     else if (tb[IFLA_IFINDEX_LIST])
> > +             err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
> > +                                     nla_len(tb[IFLA_IFINDEX_LIST]));
> >       else
> >               goto out;
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-28 11:13   ` Lahav Schlesinger
@ 2021-11-28 11:43     ` Leon Romanovsky
  2021-11-28 19:17       ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-11-28 11:43 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: netdev, kuba, dsahern

On Sun, Nov 28, 2021 at 01:13:14PM +0200, Lahav Schlesinger wrote:
> On Sun, Nov 28, 2021 at 09:33:01AM +0200, Leon Romanovsky wrote:
> > CAUTION: External E-Mail - Use caution with links and attachments
> >
> >
> > On Thu, Nov 25, 2021 at 06:51:46PM +0200, Lahav Schlesinger wrote:
> > > Under large scale, some routers are required to support tens of thousands
> > > of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> > > vrfs, etc).
> > > At times such routers are required to delete massive amounts of devices
> > > at once, such as when a factory reset is performed on the router (causing
> > > a deletion of all devices), or when a configuration is restored after an
> > > upgrade, or as a request from an operator.
> > >
> > > Currently there are 2 means of deleting devices using Netlink:
> > > 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> > > or by name using IFLA_IFNAME)
> > > 2. Delete all device that belong to a group (using IFLA_GROUP)
> > >
> > > Deletion of devices one-by-one has poor performance on large scale of
> > > devices compared to "group deletion":
> > > After all device are handled, netdev_run_todo() is called which
> > > calls rcu_barrier() to finish any outstanding RCU callbacks that were
> > > registered during the deletion of the device, then wait until the
> > > refcount of all the devices is 0, then perform final cleanups.
> > >
> > > However, calling rcu_barrier() is a very costly operation, each call
> > > taking in the order of 10s of milliseconds.
> > >
> > > When deleting a large number of device one-by-one, rcu_barrier()
> > > will be called for each device being deleted.
> > > As an example, following benchmark deletes 10K loopback devices,
> > > all of which are UP and with only IPv6 LLA being configured:
> > >
> > > 1. Deleting one-by-one using 1 thread : 243 seconds
> > > 2. Deleting one-by-one using 10 thread: 70 seconds
> > > 3. Deleting one-by-one using 50 thread: 54 seconds
> > > 4. Deleting all using "group deletion": 30 seconds
> > >
> > > Note that even though the deletion logic takes place under the rtnl
> > > lock, since the call to rcu_barrier() is outside the lock we gain
> > > some improvements.
> > >
> > > But, while "group deletion" is the fastest, it is not suited for
> > > deleting large number of arbitrary devices which are unknown a head of
> > > time. Furthermore, moving large number of devices to a group is also a
> > > costly operation.
> > >
> > > This patch adds support for passing an arbitrary list of ifindex of
> > > devices to delete with a new IFLA_IFINDEX_LIST attribute.
> > > This gives a more fine-grained control over which devices to delete,
> > > while still resulting in rcu_barrier() being called only once.
> > > Indeed, the timings of using this new API to delete 10K devices is
> > > the same as using the existing "group" deletion.
> > >
> > > The size constraints on the attribute means the API can delete at most
> > > 16382 devices in a single request.
> > >
> > > Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> > > ---
> > > v2 -> v3
> > >  - Rename 'ifindex_list' to 'ifindices', and pass it as int*
> > >  - Clamp 'ops' variable in second loop.
> > >
> > > v1 -> v2
> > >  - Unset 'len' of IFLA_IFINDEX_LIST in policy.
> > >  - Use __dev_get_by_index() instead of n^2 loop.
> > >  - Return -ENODEV if any ifindex is not present.
> > >  - Saved devices in an array.
> > >  - Fix formatting.
> > >
> > >  include/uapi/linux/if_link.h |  1 +
> > >  net/core/rtnetlink.c         | 50 ++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 51 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > > index eebd3894fe89..f950bf6ed025 100644
> > > --- a/include/uapi/linux/if_link.h
> > > +++ b/include/uapi/linux/if_link.h
> > > @@ -348,6 +348,7 @@ enum {
> > >       IFLA_PARENT_DEV_NAME,
> > >       IFLA_PARENT_DEV_BUS_NAME,
> > >
> > > +     IFLA_IFINDEX_LIST,
> > >       __IFLA_MAX
> > >  };
> > >
> > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > index fd030e02f16d..49d1a3954a01 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> > >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> > >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> > >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > > +     [IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY },
> > >  };
> > >
> > >  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > > @@ -3050,6 +3051,52 @@ static int rtnl_group_dellink(const struct net *net, int group)
> > >       return 0;
> > >  }
> > >
> > > +static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
> > > +{
> > > +     const int num_devices = size / sizeof(int);
> > > +     struct net_device **dev_list;
> > > +     LIST_HEAD(list_kill);
> > > +     int i, ret;
> > > +
> > > +     if (size <= 0 || size % sizeof(int))
> > > +             return -EINVAL;
> > > +
> > > +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> > > +     if (!dev_list)
> > > +             return -ENOMEM;
> > > +
> > > +     for (i = 0; i < num_devices; i++) {
> > > +             const struct rtnl_link_ops *ops;
> > > +             struct net_device *dev;
> > > +
> > > +             ret = -ENODEV;
> > > +             dev = __dev_get_by_index(net, ifindices[i]);
> > > +             if (!dev)
> > > +                     goto out_free;
> > > +
> > > +             ret = -EOPNOTSUPP;
> > > +             ops = dev->rtnl_link_ops;
> > > +             if (!ops || !ops->dellink)
> > > +                     goto out_free;
> >
> > I'm just curious, how does user know that specific device doesn't
> > have ->delink implementation? It is important to know because you
> > are failing whole batch deletion. At least for single delink, users
> > have chance to skip "failed" one and continue.
> >
> > Thanks
> 
> Hi Leon, I don't see any immediate way users can get this information.
> I do think that failing the whole request is better than silently
> ignoring such devices.

I don't have any preference here, probably "fail all" is the easiest
solution here.

Thanks

> 
> Perhaps an alternative is to return the unsupported device's name in an
> extack? To make NL_SET_ERR_MSG() support string formatting this will
> require changing netlink_ext_ack::_msg to be an array though (skimming
> over the calls to NL_SET_ERR_MSG(), a buffer of size say 128 should be
> large enough).
> 
> >
> > > +
> > > +             dev_list[i] = dev;
> > > +     }
> > > +
> > > +     for (i = 0; i < num_devices; i++) {
> > > +             struct net_device *dev = dev_list[i];
> > > +
> > > +             dev->rtnl_link_ops->dellink(dev, &list_kill);
> > > +     }
> > > +
> > > +     unregister_netdevice_many(&list_kill);
> > > +
> > > +     ret = 0;
> > > +
> > > +out_free:
> > > +     kfree(dev_list);
> > > +     return ret;
> > > +}
> > > +
> > >  int rtnl_delete_link(struct net_device *dev)
> > >  {
> > >       const struct rtnl_link_ops *ops;
> > > @@ -3102,6 +3149,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> > >                                  tb[IFLA_ALT_IFNAME], NULL);
> > >       else if (tb[IFLA_GROUP])
> > >               err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> > > +     else if (tb[IFLA_IFINDEX_LIST])
> > > +             err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
> > > +                                     nla_len(tb[IFLA_IFINDEX_LIST]));
> > >       else
> > >               goto out;
> > >
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-28 11:43     ` Leon Romanovsky
@ 2021-11-28 19:17       ` David Ahern
  2021-11-29 13:53         ` Lahav Schlesinger
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2021-11-28 19:17 UTC (permalink / raw)
  To: Leon Romanovsky, Lahav Schlesinger; +Cc: netdev, kuba

On 11/28/21 4:43 AM, Leon Romanovsky wrote:
>>>> +static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
>>>> +{
>>>> +     const int num_devices = size / sizeof(int);
>>>> +     struct net_device **dev_list;
>>>> +     LIST_HEAD(list_kill);
>>>> +     int i, ret;
>>>> +
>>>> +     if (size <= 0 || size % sizeof(int))
>>>> +             return -EINVAL;
>>>> +
>>>> +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
>>>> +     if (!dev_list)
>>>> +             return -ENOMEM;
>>>> +
>>>> +     for (i = 0; i < num_devices; i++) {
>>>> +             const struct rtnl_link_ops *ops;
>>>> +             struct net_device *dev;
>>>> +
>>>> +             ret = -ENODEV;
>>>> +             dev = __dev_get_by_index(net, ifindices[i]);
>>>> +             if (!dev)
>>>> +                     goto out_free;
>>>> +
>>>> +             ret = -EOPNOTSUPP;
>>>> +             ops = dev->rtnl_link_ops;
>>>> +             if (!ops || !ops->dellink)
>>>> +                     goto out_free;
>>>
>>> I'm just curious, how does user know that specific device doesn't
>>> have ->delink implementation? It is important to know because you
>>> are failing whole batch deletion. At least for single delink, users
>>> have chance to skip "failed" one and continue.
>>>
>>> Thanks
>>
>> Hi Leon, I don't see any immediate way users can get this information.
>> I do think that failing the whole request is better than silently
>> ignoring such devices.
> 
> I don't have any preference here, probably "fail all" is the easiest
> solution here.

Since there is no API to say which devices can not be deleted failing
the group delete because 1 is say a physical device is going to be
frustrating for users. I think the better approach is to delete what you
can and set extack message to 'Some devices can not be deleted.'


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-28 19:17       ` David Ahern
@ 2021-11-29 13:53         ` Lahav Schlesinger
  2021-11-29 15:30           ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Lahav Schlesinger @ 2021-11-29 13:53 UTC (permalink / raw)
  To: David Ahern; +Cc: Leon Romanovsky, netdev, kuba

On Sun, Nov 28, 2021 at 12:17:30PM -0700, David Ahern wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On 11/28/21 4:43 AM, Leon Romanovsky wrote:
> >>>> +static int rtnl_list_dellink(struct net *net, int *ifindices, int size)
> >>>> +{
> >>>> +     const int num_devices = size / sizeof(int);
> >>>> +     struct net_device **dev_list;
> >>>> +     LIST_HEAD(list_kill);
> >>>> +     int i, ret;
> >>>> +
> >>>> +     if (size <= 0 || size % sizeof(int))
> >>>> +             return -EINVAL;
> >>>> +
> >>>> +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> >>>> +     if (!dev_list)
> >>>> +             return -ENOMEM;
> >>>> +
> >>>> +     for (i = 0; i < num_devices; i++) {
> >>>> +             const struct rtnl_link_ops *ops;
> >>>> +             struct net_device *dev;
> >>>> +
> >>>> +             ret = -ENODEV;
> >>>> +             dev = __dev_get_by_index(net, ifindices[i]);
> >>>> +             if (!dev)
> >>>> +                     goto out_free;
> >>>> +
> >>>> +             ret = -EOPNOTSUPP;
> >>>> +             ops = dev->rtnl_link_ops;
> >>>> +             if (!ops || !ops->dellink)
> >>>> +                     goto out_free;
> >>>
> >>> I'm just curious, how does user know that specific device doesn't
> >>> have ->delink implementation? It is important to know because you
> >>> are failing whole batch deletion. At least for single delink, users
> >>> have chance to skip "failed" one and continue.
> >>>
> >>> Thanks
> >>
> >> Hi Leon, I don't see any immediate way users can get this information.
> >> I do think that failing the whole request is better than silently
> >> ignoring such devices.
> >
> > I don't have any preference here, probably "fail all" is the easiest
> > solution here.
>
> Since there is no API to say which devices can not be deleted failing
> the group delete because 1 is say a physical device is going to be
> frustrating for users. I think the better approach is to delete what you
> can and set extack message to 'Some devices can not be deleted.'
>

Hi David, while I also don't have any strong preference here, my
reasoning for failing the whole request if one device can't be deleted
was so it will share the behaviour we currently have with group deletion.
If you're okay with this asymmetry I'll send a V4.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-29 13:53         ` Lahav Schlesinger
@ 2021-11-29 15:30           ` David Ahern
  2021-11-29 18:10             ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2021-11-29 15:30 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: Leon Romanovsky, netdev, kuba

On 11/29/21 6:53 AM, Lahav Schlesinger wrote:
> Hi David, while I also don't have any strong preference here, my
> reasoning for failing the whole request if one device can't be deleted
> was so it will share the behaviour we currently have with group deletion.
> If you're okay with this asymmetry I'll send a V4.

good point - new features should be consistent with existing code.

You can add another attribute to the request to say 'Skip devices that
can not be deleted'.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-29 15:30           ` David Ahern
@ 2021-11-29 18:10             ` Jakub Kicinski
  2021-11-29 19:14               ` David Ahern
  2021-11-30  8:04               ` Nicolas Dichtel
  0 siblings, 2 replies; 10+ messages in thread
From: Jakub Kicinski @ 2021-11-29 18:10 UTC (permalink / raw)
  To: David Ahern; +Cc: Lahav Schlesinger, Leon Romanovsky, netdev

On Mon, 29 Nov 2021 08:30:16 -0700 David Ahern wrote:
> On 11/29/21 6:53 AM, Lahav Schlesinger wrote:
> > Hi David, while I also don't have any strong preference here, my
> > reasoning for failing the whole request if one device can't be deleted
> > was so it will share the behaviour we currently have with group deletion.
> > If you're okay with this asymmetry I'll send a V4.  
> 
> good point - new features should be consistent with existing code.
> 
> You can add another attribute to the request to say 'Skip devices that
> can not be deleted'.

The patch is good as is then? I can resurrect it from 'Changes
Requested' and apply.

Any opinion on wrapping the ifindices into separate attrs, Dave?
I don't think the 32k vs 64k max distinction matters all that much,
user can send multiple messages, and we could point the extack at
the correct ifindex's attribute.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-29 18:10             ` Jakub Kicinski
@ 2021-11-29 19:14               ` David Ahern
  2021-11-30  8:04               ` Nicolas Dichtel
  1 sibling, 0 replies; 10+ messages in thread
From: David Ahern @ 2021-11-29 19:14 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Lahav Schlesinger, Leon Romanovsky, netdev

On 11/29/21 11:10 AM, Jakub Kicinski wrote:
> On Mon, 29 Nov 2021 08:30:16 -0700 David Ahern wrote:
>> On 11/29/21 6:53 AM, Lahav Schlesinger wrote:
>>> Hi David, while I also don't have any strong preference here, my
>>> reasoning for failing the whole request if one device can't be deleted
>>> was so it will share the behaviour we currently have with group deletion.
>>> If you're okay with this asymmetry I'll send a V4.  
>>
>> good point - new features should be consistent with existing code.
>>
>> You can add another attribute to the request to say 'Skip devices that
>> can not be deleted'.
> 
> The patch is good as is then? I can resurrect it from 'Changes
> Requested' and apply.

Took another look at it. We should fail if a delete request contains
both IFLA_GROUP and IFLA_IFINDEX_LIST since both are not honored. The
logic in rtnl_list_dellink looks ok.

> 
> Any opinion on wrapping the ifindices into separate attrs, Dave?
> I don't think the 32k vs 64k max distinction matters all that much,
> user can send multiple messages, and we could point the extack at
> the correct ifindex's attribute.
> 

no opinion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-11-29 18:10             ` Jakub Kicinski
  2021-11-29 19:14               ` David Ahern
@ 2021-11-30  8:04               ` Nicolas Dichtel
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolas Dichtel @ 2021-11-30  8:04 UTC (permalink / raw)
  To: Jakub Kicinski, David Ahern; +Cc: Lahav Schlesinger, Leon Romanovsky, netdev

Le 29/11/2021 à 19:10, Jakub Kicinski a écrit :
> On Mon, 29 Nov 2021 08:30:16 -0700 David Ahern wrote:
>> On 11/29/21 6:53 AM, Lahav Schlesinger wrote:
>>> Hi David, while I also don't have any strong preference here, my
>>> reasoning for failing the whole request if one device can't be deleted
>>> was so it will share the behaviour we currently have with group deletion.
>>> If you're okay with this asymmetry I'll send a V4.  
>>
>> good point - new features should be consistent with existing code.
>>
>> You can add another attribute to the request to say 'Skip devices that
>> can not be deleted'.
> 
> The patch is good as is then? I can resurrect it from 'Changes
> Requested' and apply.
> 
> Any opinion on wrapping the ifindices into separate attrs, Dave?
> I don't think the 32k vs 64k max distinction matters all that much,
I agree.

> user can send multiple messages, and we could point the extack at
> the correct ifindex's attribute.
> 
Good point, it would be clearer from an API POV.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-30  8:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 16:51 [PATCH net-next v3] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
2021-11-28  7:33 ` Leon Romanovsky
2021-11-28 11:13   ` Lahav Schlesinger
2021-11-28 11:43     ` Leon Romanovsky
2021-11-28 19:17       ` David Ahern
2021-11-29 13:53         ` Lahav Schlesinger
2021-11-29 15:30           ` David Ahern
2021-11-29 18:10             ` Jakub Kicinski
2021-11-29 19:14               ` David Ahern
2021-11-30  8:04               ` Nicolas Dichtel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.