All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
@ 2022-01-04  8:10 Lahav Schlesinger
  2022-01-04 14:32 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Lahav Schlesinger @ 2022-01-04  8:10 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, kuba, idosch, nicolas.dichtel, nikolay

Under large scale, some routers are required to support tens of thousands
of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
vrfs, etc).
At times such routers are required to delete massive amounts of devices
at once, such as when a factory reset is performed on the router (causing
a deletion of all devices), or when a configuration is restored after an
upgrade, or as a request from an operator.

Currently there are 2 means of deleting devices using Netlink:
1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
or by name using IFLA_IFNAME)
2. Delete all device that belong to a group (using IFLA_GROUP)

Deletion of devices one-by-one has poor performance on large scale of
devices compared to "group deletion":
After all device are handled, netdev_run_todo() is called which
calls rcu_barrier() to finish any outstanding RCU callbacks that were
registered during the deletion of the device, then wait until the
refcount of all the devices is 0, then perform final cleanups.

However, calling rcu_barrier() is a very costly operation, each call
taking in the order of 10s of milliseconds.

When deleting a large number of device one-by-one, rcu_barrier()
will be called for each device being deleted.
As an example, following benchmark deletes 10K loopback devices,
all of which are UP and with only IPv6 LLA being configured:

1. Deleting one-by-one using 1 thread : 243 seconds
2. Deleting one-by-one using 10 thread: 70 seconds
3. Deleting one-by-one using 50 thread: 54 seconds
4. Deleting all using "group deletion": 30 seconds

Note that even though the deletion logic takes place under the rtnl
lock, since the call to rcu_barrier() is outside the lock we gain
some improvements.

But, while "group deletion" is the fastest, it is not suited for
deleting large number of arbitrary devices which are unknown a head of
time. Furthermore, moving large number of devices to a group is also a
costly operation.

This patch adds support for passing an arbitrary list of ifindex of
devices to delete with a new IFLA_IFINDEX attribute. A single message
may contain multiple instances of this attribute).
This gives a more fine-grained control over which devices to delete,
while still resulting in rcu_barrier() being called only once.
Indeed, the timings of using this new API to delete 10K devices is
the same as using the existing "group" deletion.

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
---
v5 -> v6
 - Convert back to single IFLA_IFINDEX_LIST attribute instead of
   IFLA_IFINDEX
 - Added struct net_device::bulk_delete to avoid sorting ifindex list,
   in order to call ->dellink() only once per potentially duplicated ifindex
   (no increase in struct size)
 - Make sure IFLA_IFINDEX_LIST cannot be used in
   setlink()/newlink()/getlink()

v4 -> v5
 - Don't call ->dellink() multiple times if device is duplicated.

v3 -> v4
 - Change single IFLA_INDEX_LIST into multiple IFLA_IFINDEX
 - Fail if passing both IFLA_GROUP and at least one IFLA_IFNEX

v2 -> v3
 - Rename 'ifindex_list' to 'ifindices', and pass it as int*
 - Clamp 'ops' variable in second loop.

v1 -> v2
 - Unset 'len' of IFLA_IFINDEX_LIST in policy.
 - Use __dev_get_by_index() instead of n^2 loop.
 - Return -ENODEV if any ifindex is not present.
 - Saved devices in an array.
 - Fix formatting.

 include/linux/netdevice.h    |  3 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 77 ++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index df049864661d..c3cfbfaf7f06 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1926,6 +1926,8 @@ enum netdev_ml_priv_type {
  *
  *	@threaded:	napi threaded mode is enabled
  *
+ *	@bulk_delete:	Device is marked for of bulk deletion
+ *
  *	@net_notifier_list:	List of per-net netdev notifier block
  *				that follow this device when it is moved
  *				to another network namespace.
@@ -2258,6 +2260,7 @@ struct net_device {
 	bool			proto_down;
 	unsigned		wol_enabled:1;
 	unsigned		threaded:1;
+	unsigned		bulk_delete:1;
 
 	struct list_head	net_notifier_list;
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..f950bf6ed025 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 
+	IFLA_IFINDEX_LIST,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fd030e02f16d..530371767565 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
+	[IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -3009,6 +3010,11 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto errout;
 	}
 
+	if (tb[IFLA_IFINDEX_LIST]) {
+		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in setlink");
+		goto errout;
+	}
+
 	err = do_setlink(skb, dev, ifm, extack, tb, ifname, 0);
 errout:
 	return err;
@@ -3050,6 +3056,57 @@ static int rtnl_group_dellink(const struct net *net, int group)
 	return 0;
 }
 
+static int rtnl_list_dellink(struct net *net, int *ifindices, int size,
+			     struct netlink_ext_ack *extack)
+{
+	const int num_devices = size / sizeof(int);
+	struct net_device *dev;
+	LIST_HEAD(list_kill);
+	int i, j, ret;
+
+	if (size <= 0 || size % sizeof(int))
+		return -EINVAL;
+
+	for (i = 0; i < num_devices; i++) {
+		const struct rtnl_link_ops *ops;
+
+		ret = -ENODEV;
+		dev = __dev_get_by_index(net, ifindices[i]);
+		if (!dev) {
+			NL_SET_ERR_MSG(extack, "Unknown ifindex");
+			goto cleanup;
+		}
+
+		ret = -EOPNOTSUPP;
+		ops = dev->rtnl_link_ops;
+		if (!ops || !ops->dellink) {
+			NL_SET_ERR_MSG(extack, "Device cannot be deleted");
+			goto cleanup;
+		}
+
+		dev->bulk_delete = 1;
+	}
+
+	for_each_netdev(net, dev) {
+		if (dev->bulk_delete) {
+			dev->rtnl_link_ops->dellink(dev, &list_kill);
+			dev->bulk_delete = 0;
+		}
+	}
+
+	unregister_netdevice_many(&list_kill);
+
+	return 0;
+
+cleanup:
+	for (j = 0; j < i; j++) {
+		dev = __dev_get_by_index(net, ifindices[j]);
+		dev->bulk_delete = 0;
+	}
+
+	return ret;
+}
+
 int rtnl_delete_link(struct net_device *dev)
 {
 	const struct rtnl_link_ops *ops;
@@ -3093,6 +3150,11 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			return PTR_ERR(tgt_net);
 	}
 
+	if (tb[IFLA_GROUP] && tb[IFLA_IFINDEX_LIST]) {
+		NL_SET_ERR_MSG(extack, "Can't pass both IFLA_GROUP and IFLA_IFINDEX_LIST");
+		return -EOPNOTSUPP;
+	}
+
 	err = -EINVAL;
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
@@ -3102,6 +3164,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 				   tb[IFLA_ALT_IFNAME], NULL);
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
+	else if (tb[IFLA_IFINDEX_LIST])
+		err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
+					nla_len(tb[IFLA_IFINDEX_LIST]), extack);
 	else
 		goto out;
 
@@ -3285,6 +3350,12 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	else
 		ifname[0] = '\0';
 
+	err = -EINVAL;
+	if (tb[IFLA_IFINDEX_LIST]) {
+		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in newlink");
+		return err;
+	}
+
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
 		dev = __dev_get_by_index(net, ifm->ifi_index);
@@ -3577,6 +3648,12 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		return err;
 
+	err = -EINVAL;
+	if (tb[IFLA_IFINDEX_LIST]) {
+		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in getlink");
+		return err;
+	}
+
 	if (tb[IFLA_TARGET_NETNSID]) {
 		netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
 		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04  8:10 [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
@ 2022-01-04 14:32 ` Eric Dumazet
  2022-01-04 20:40   ` Lahav Schlesinger
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2022-01-04 14:32 UTC (permalink / raw)
  To: Lahav Schlesinger, netdev; +Cc: dsahern, kuba, idosch, nicolas.dichtel, nikolay


On 1/4/22 00:10, Lahav Schlesinger wrote:
> Under large scale, some routers are required to support tens of thousands
> of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> vrfs, etc).
> At times such routers are required to delete massive amounts of devices
> at once, such as when a factory reset is performed on the router (causing
> a deletion of all devices), or when a configuration is restored after an
> upgrade, or as a request from an operator.
>
> Currently there are 2 means of deleting devices using Netlink:
> 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> or by name using IFLA_IFNAME)
> 2. Delete all device that belong to a group (using IFLA_GROUP)
>
> Deletion of devices one-by-one has poor performance on large scale of
> devices compared to "group deletion":
> After all device are handled, netdev_run_todo() is called which
> calls rcu_barrier() to finish any outstanding RCU callbacks that were
> registered during the deletion of the device, then wait until the
> refcount of all the devices is 0, then perform final cleanups.
>
> However, calling rcu_barrier() is a very costly operation, each call
> taking in the order of 10s of milliseconds.
>
> When deleting a large number of device one-by-one, rcu_barrier()
> will be called for each device being deleted.
> As an example, following benchmark deletes 10K loopback devices,
> all of which are UP and with only IPv6 LLA being configured:
>
> 1. Deleting one-by-one using 1 thread : 243 seconds
> 2. Deleting one-by-one using 10 thread: 70 seconds
> 3. Deleting one-by-one using 50 thread: 54 seconds
> 4. Deleting all using "group deletion": 30 seconds
>
> Note that even though the deletion logic takes place under the rtnl
> lock, since the call to rcu_barrier() is outside the lock we gain
> some improvements.
>
> But, while "group deletion" is the fastest, it is not suited for
> deleting large number of arbitrary devices which are unknown a head of
> time. Furthermore, moving large number of devices to a group is also a
> costly operation.
>
> This patch adds support for passing an arbitrary list of ifindex of
> devices to delete with a new IFLA_IFINDEX attribute. A single message
> may contain multiple instances of this attribute).
> This gives a more fine-grained control over which devices to delete,
> while still resulting in rcu_barrier() being called only once.
> Indeed, the timings of using this new API to delete 10K devices is
> the same as using the existing "group" deletion.
>
> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> ---
> v5 -> v6
>   - Convert back to single IFLA_IFINDEX_LIST attribute instead of
>     IFLA_IFINDEX
>   - Added struct net_device::bulk_delete to avoid sorting ifindex list,
>     in order to call ->dellink() only once per potentially duplicated ifindex
>     (no increase in struct size)
>   - Make sure IFLA_IFINDEX_LIST cannot be used in
>     setlink()/newlink()/getlink()
>
> v4 -> v5
>   - Don't call ->dellink() multiple times if device is duplicated.
>
> v3 -> v4
>   - Change single IFLA_INDEX_LIST into multiple IFLA_IFINDEX
>   - Fail if passing both IFLA_GROUP and at least one IFLA_IFNEX
>
> v2 -> v3
>   - Rename 'ifindex_list' to 'ifindices', and pass it as int*
>   - Clamp 'ops' variable in second loop.
>
> v1 -> v2
>   - Unset 'len' of IFLA_IFINDEX_LIST in policy.
>   - Use __dev_get_by_index() instead of n^2 loop.
>   - Return -ENODEV if any ifindex is not present.
>   - Saved devices in an array.
>   - Fix formatting.
>
>   include/linux/netdevice.h    |  3 ++
>   include/uapi/linux/if_link.h |  1 +
>   net/core/rtnetlink.c         | 77 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 81 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index df049864661d..c3cfbfaf7f06 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1926,6 +1926,8 @@ enum netdev_ml_priv_type {
>    *
>    *	@threaded:	napi threaded mode is enabled
>    *
> + *	@bulk_delete:	Device is marked for of bulk deletion
> + *
>    *	@net_notifier_list:	List of per-net netdev notifier block
>    *				that follow this device when it is moved
>    *				to another network namespace.
> @@ -2258,6 +2260,7 @@ struct net_device {
>   	bool			proto_down;
>   	unsigned		wol_enabled:1;
>   	unsigned		threaded:1;
> +	unsigned		bulk_delete:1;
>   
>   	struct list_head	net_notifier_list;
>   
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index eebd3894fe89..f950bf6ed025 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -348,6 +348,7 @@ enum {
>   	IFLA_PARENT_DEV_NAME,
>   	IFLA_PARENT_DEV_BUS_NAME,
>   
> +	IFLA_IFINDEX_LIST,
>   	__IFLA_MAX
>   };
>   
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index fd030e02f16d..530371767565 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
>   	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
>   	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
> +	[IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY },
>   };
>   
>   static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -3009,6 +3010,11 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   		goto errout;
>   	}
>   
> +	if (tb[IFLA_IFINDEX_LIST]) {
> +		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in setlink");
> +		goto errout;
> +	}
> +
>   	err = do_setlink(skb, dev, ifm, extack, tb, ifname, 0);
>   errout:
>   	return err;
> @@ -3050,6 +3056,57 @@ static int rtnl_group_dellink(const struct net *net, int group)
>   	return 0;
>   }
>   
> +static int rtnl_list_dellink(struct net *net, int *ifindices, int size,
> +			     struct netlink_ext_ack *extack)
> +{
> +	const int num_devices = size / sizeof(int);
> +	struct net_device *dev;
> +	LIST_HEAD(list_kill);
> +	int i, j, ret;
> +
> +	if (size <= 0 || size % sizeof(int))
> +		return -EINVAL;
> +
> +	for (i = 0; i < num_devices; i++) {
> +		const struct rtnl_link_ops *ops;
> +
> +		ret = -ENODEV;
> +		dev = __dev_get_by_index(net, ifindices[i]);


What happens if one device is present multiple times in ifindices[] ?

This should be an error.


> +		if (!dev) {
> +			NL_SET_ERR_MSG(extack, "Unknown ifindex");
> +			goto cleanup;
> +		}
> +
> +		ret = -EOPNOTSUPP;
> +		ops = dev->rtnl_link_ops;
> +		if (!ops || !ops->dellink) {
> +			NL_SET_ERR_MSG(extack, "Device cannot be deleted");
> +			goto cleanup;
> +		}
> +
> +		dev->bulk_delete = 1;
> +	}
> +
> +	for_each_netdev(net, dev) {

This is going to be very expensive on hosts with 1 million netdev.

You should remove this dev->bulk_delete and instead use a list.

You already use @list_kill, you only need a second list and possibly 
reuse dev->unreg_list

If you do not feel confortable about reusing dev->unreg_list, add a new 
anchor (like dev->bulk_kill_list)

> +		if (dev->bulk_delete) {
> +			dev->rtnl_link_ops->dellink(dev, &list_kill);
> +			dev->bulk_delete = 0;
> +		}
> +	}
> +
> +	unregister_netdevice_many(&list_kill);
> +
> +	return 0;
> +
> +cleanup:
> +	for (j = 0; j < i; j++) {
> +		dev = __dev_get_by_index(net, ifindices[j]);
> +		dev->bulk_delete = 0;
> +	}
> +
> +	return ret;
> +}
> +
>   int rtnl_delete_link(struct net_device *dev)
>   {
>   	const struct rtnl_link_ops *ops;
> @@ -3093,6 +3150,11 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   			return PTR_ERR(tgt_net);
>   	}
>   
> +	if (tb[IFLA_GROUP] && tb[IFLA_IFINDEX_LIST]) {
> +		NL_SET_ERR_MSG(extack, "Can't pass both IFLA_GROUP and IFLA_IFINDEX_LIST");
> +		return -EOPNOTSUPP;
> +	}
> +
>   	err = -EINVAL;
>   	ifm = nlmsg_data(nlh);
>   	if (ifm->ifi_index > 0)
> @@ -3102,6 +3164,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   				   tb[IFLA_ALT_IFNAME], NULL);
>   	else if (tb[IFLA_GROUP])
>   		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> +	else if (tb[IFLA_IFINDEX_LIST])
> +		err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
> +					nla_len(tb[IFLA_IFINDEX_LIST]), extack);
>   	else
>   		goto out;
>   
> @@ -3285,6 +3350,12 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	else
>   		ifname[0] = '\0';
>   
> +	err = -EINVAL;
> +	if (tb[IFLA_IFINDEX_LIST]) {
> +		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in newlink");
> +		return err;
> +	}
> +
>   	ifm = nlmsg_data(nlh);
>   	if (ifm->ifi_index > 0)
>   		dev = __dev_get_by_index(net, ifm->ifi_index);
> @@ -3577,6 +3648,12 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>   	if (err < 0)
>   		return err;
>   
> +	err = -EINVAL;
> +	if (tb[IFLA_IFINDEX_LIST]) {
> +		NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in getlink");
> +		return err;
> +	}
> +
>   	if (tb[IFLA_TARGET_NETNSID]) {
>   		netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
>   		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04 14:32 ` Eric Dumazet
@ 2022-01-04 20:40   ` Lahav Schlesinger
  2022-01-04 21:12     ` Jakub Kicinski
  2022-01-05  0:18     ` David Ahern
  0 siblings, 2 replies; 9+ messages in thread
From: Lahav Schlesinger @ 2022-01-04 20:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, dsahern, kuba, idosch, nicolas.dichtel, nikolay

On Tue, Jan 04, 2022 at 06:32:31AM -0800, Eric Dumazet wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On 1/4/22 00:10, Lahav Schlesinger wrote:
> > Under large scale, some routers are required to support tens of thousands
> > of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
> > vrfs, etc).
> > At times such routers are required to delete massive amounts of devices
> > at once, such as when a factory reset is performed on the router (causing
> > a deletion of all devices), or when a configuration is restored after an
> > upgrade, or as a request from an operator.
> >
> > Currently there are 2 means of deleting devices using Netlink:
> > 1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
> > or by name using IFLA_IFNAME)
> > 2. Delete all device that belong to a group (using IFLA_GROUP)
> >
> > Deletion of devices one-by-one has poor performance on large scale of
> > devices compared to "group deletion":
> > After all device are handled, netdev_run_todo() is called which
> > calls rcu_barrier() to finish any outstanding RCU callbacks that were
> > registered during the deletion of the device, then wait until the
> > refcount of all the devices is 0, then perform final cleanups.
> >
> > However, calling rcu_barrier() is a very costly operation, each call
> > taking in the order of 10s of milliseconds.
> >
> > When deleting a large number of device one-by-one, rcu_barrier()
> > will be called for each device being deleted.
> > As an example, following benchmark deletes 10K loopback devices,
> > all of which are UP and with only IPv6 LLA being configured:
> >
> > 1. Deleting one-by-one using 1 thread : 243 seconds
> > 2. Deleting one-by-one using 10 thread: 70 seconds
> > 3. Deleting one-by-one using 50 thread: 54 seconds
> > 4. Deleting all using "group deletion": 30 seconds
> >
> > Note that even though the deletion logic takes place under the rtnl
> > lock, since the call to rcu_barrier() is outside the lock we gain
> > some improvements.
> >
> > But, while "group deletion" is the fastest, it is not suited for
> > deleting large number of arbitrary devices which are unknown a head of
> > time. Furthermore, moving large number of devices to a group is also a
> > costly operation.
> >
> > This patch adds support for passing an arbitrary list of ifindex of
> > devices to delete with a new IFLA_IFINDEX attribute. A single message
> > may contain multiple instances of this attribute).
> > This gives a more fine-grained control over which devices to delete,
> > while still resulting in rcu_barrier() being called only once.
> > Indeed, the timings of using this new API to delete 10K devices is
> > the same as using the existing "group" deletion.
> >
> > Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> > ---
> > v5 -> v6
> >   - Convert back to single IFLA_IFINDEX_LIST attribute instead of
> >     IFLA_IFINDEX
> >   - Added struct net_device::bulk_delete to avoid sorting ifindex list,
> >     in order to call ->dellink() only once per potentially duplicated ifindex
> >     (no increase in struct size)
> >   - Make sure IFLA_IFINDEX_LIST cannot be used in
> >     setlink()/newlink()/getlink()
> >
> > v4 -> v5
> >   - Don't call ->dellink() multiple times if device is duplicated.
> >
> > v3 -> v4
> >   - Change single IFLA_INDEX_LIST into multiple IFLA_IFINDEX
> >   - Fail if passing both IFLA_GROUP and at least one IFLA_IFNEX
> >
> > v2 -> v3
> >   - Rename 'ifindex_list' to 'ifindices', and pass it as int*
> >   - Clamp 'ops' variable in second loop.
> >
> > v1 -> v2
> >   - Unset 'len' of IFLA_IFINDEX_LIST in policy.
> >   - Use __dev_get_by_index() instead of n^2 loop.
> >   - Return -ENODEV if any ifindex is not present.
> >   - Saved devices in an array.
> >   - Fix formatting.
> >
> >   include/linux/netdevice.h    |  3 ++
> >   include/uapi/linux/if_link.h |  1 +
> >   net/core/rtnetlink.c         | 77 ++++++++++++++++++++++++++++++++++++
> >   3 files changed, 81 insertions(+)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index df049864661d..c3cfbfaf7f06 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1926,6 +1926,8 @@ enum netdev_ml_priv_type {
> >    *
> >    *  @threaded:      napi threaded mode is enabled
> >    *
> > + *   @bulk_delete:   Device is marked for of bulk deletion
> > + *
> >    *  @net_notifier_list:     List of per-net netdev notifier block
> >    *                          that follow this device when it is moved
> >    *                          to another network namespace.
> > @@ -2258,6 +2260,7 @@ struct net_device {
> >       bool                    proto_down;
> >       unsigned                wol_enabled:1;
> >       unsigned                threaded:1;
> > +     unsigned                bulk_delete:1;
> >
> >       struct list_head        net_notifier_list;
> >
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index eebd3894fe89..f950bf6ed025 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -348,6 +348,7 @@ enum {
> >       IFLA_PARENT_DEV_NAME,
> >       IFLA_PARENT_DEV_BUS_NAME,
> >
> > +     IFLA_IFINDEX_LIST,
> >       __IFLA_MAX
> >   };
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index fd030e02f16d..530371767565 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > +     [IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY },
> >   };
> >
> >   static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > @@ -3009,6 +3010,11 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >               goto errout;
> >       }
> >
> > +     if (tb[IFLA_IFINDEX_LIST]) {
> > +             NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in setlink");
> > +             goto errout;
> > +     }
> > +
> >       err = do_setlink(skb, dev, ifm, extack, tb, ifname, 0);
> >   errout:
> >       return err;
> > @@ -3050,6 +3056,57 @@ static int rtnl_group_dellink(const struct net *net, int group)
> >       return 0;
> >   }
> >
> > +static int rtnl_list_dellink(struct net *net, int *ifindices, int size,
> > +                          struct netlink_ext_ack *extack)
> > +{
> > +     const int num_devices = size / sizeof(int);
> > +     struct net_device *dev;
> > +     LIST_HEAD(list_kill);
> > +     int i, j, ret;
> > +
> > +     if (size <= 0 || size % sizeof(int))
> > +             return -EINVAL;
> > +
> > +     for (i = 0; i < num_devices; i++) {
> > +             const struct rtnl_link_ops *ops;
> > +
> > +             ret = -ENODEV;
> > +             dev = __dev_get_by_index(net, ifindices[i]);
>
>
> What happens if one device is present multiple times in ifindices[] ?
>
> This should be an error.

Right, I'll add a check.

>
>
> > +             if (!dev) {
> > +                     NL_SET_ERR_MSG(extack, "Unknown ifindex");
> > +                     goto cleanup;
> > +             }
> > +
> > +             ret = -EOPNOTSUPP;
> > +             ops = dev->rtnl_link_ops;
> > +             if (!ops || !ops->dellink) {
> > +                     NL_SET_ERR_MSG(extack, "Device cannot be deleted");
> > +                     goto cleanup;
> > +             }
> > +
> > +             dev->bulk_delete = 1;
> > +     }
> > +
> > +     for_each_netdev(net, dev) {
>
> This is going to be very expensive on hosts with 1 million netdev.
>
> You should remove this dev->bulk_delete and instead use a list.
>
> You already use @list_kill, you only need a second list and possibly
> reuse dev->unreg_list
>
> If you do not feel confortable about reusing dev->unreg_list, add a new
> anchor (like dev->bulk_kill_list)

I tried using dev->unreg_list but it doesn't work e.g. for veth pairs
where ->dellink() of a veth automatically adds the peer. Therefore if
@ifindices contains both peers then the first ->dellink() will remove
the next device from @list_kill. This caused a page fault when
@list_kill was further iterated on.

I opted to add a flag to struct net_device as David suggested in order
to avoid increasing sizeof(struct net_device), but perhaps it's not that
big of an issue.
If it's fine then I'll update it.

>
> > +             if (dev->bulk_delete) {
> > +                     dev->rtnl_link_ops->dellink(dev, &list_kill);
> > +                     dev->bulk_delete = 0;
> > +             }
> > +     }
> > +
> > +     unregister_netdevice_many(&list_kill);
> > +
> > +     return 0;
> > +
> > +cleanup:
> > +     for (j = 0; j < i; j++) {
> > +             dev = __dev_get_by_index(net, ifindices[j]);
> > +             dev->bulk_delete = 0;
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> >   int rtnl_delete_link(struct net_device *dev)
> >   {
> >       const struct rtnl_link_ops *ops;
> > @@ -3093,6 +3150,11 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >                       return PTR_ERR(tgt_net);
> >       }
> >
> > +     if (tb[IFLA_GROUP] && tb[IFLA_IFINDEX_LIST]) {
> > +             NL_SET_ERR_MSG(extack, "Can't pass both IFLA_GROUP and IFLA_IFINDEX_LIST");
> > +             return -EOPNOTSUPP;
> > +     }
> > +
> >       err = -EINVAL;
> >       ifm = nlmsg_data(nlh);
> >       if (ifm->ifi_index > 0)
> > @@ -3102,6 +3164,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >                                  tb[IFLA_ALT_IFNAME], NULL);
> >       else if (tb[IFLA_GROUP])
> >               err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> > +     else if (tb[IFLA_IFINDEX_LIST])
> > +             err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]),
> > +                                     nla_len(tb[IFLA_IFINDEX_LIST]), extack);
> >       else
> >               goto out;
> >
> > @@ -3285,6 +3350,12 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >       else
> >               ifname[0] = '\0';
> >
> > +     err = -EINVAL;
> > +     if (tb[IFLA_IFINDEX_LIST]) {
> > +             NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in newlink");
> > +             return err;
> > +     }
> > +
> >       ifm = nlmsg_data(nlh);
> >       if (ifm->ifi_index > 0)
> >               dev = __dev_get_by_index(net, ifm->ifi_index);
> > @@ -3577,6 +3648,12 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >       if (err < 0)
> >               return err;
> >
> > +     err = -EINVAL;
> > +     if (tb[IFLA_IFINDEX_LIST]) {
> > +             NL_SET_ERR_MSG(extack, "ifindex list attribute cannot be used in getlink");
> > +             return err;
> > +     }
> > +
> >       if (tb[IFLA_TARGET_NETNSID]) {
> >               netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
> >               tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04 20:40   ` Lahav Schlesinger
@ 2022-01-04 21:12     ` Jakub Kicinski
  2022-01-05  0:19       ` David Ahern
  2022-01-05  0:18     ` David Ahern
  1 sibling, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2022-01-04 21:12 UTC (permalink / raw)
  To: Lahav Schlesinger
  Cc: Eric Dumazet, netdev, dsahern, idosch, nicolas.dichtel, nikolay

On Tue, 4 Jan 2022 22:40:43 +0200 Lahav Schlesinger wrote:
> > This is going to be very expensive on hosts with 1 million netdev.
> >
> > You should remove this dev->bulk_delete and instead use a list.
> >
> > You already use @list_kill, you only need a second list and possibly
> > reuse dev->unreg_list
> >
> > If you do not feel confortable about reusing dev->unreg_list, add a new
> > anchor (like dev->bulk_kill_list)  
> 
> I tried using dev->unreg_list but it doesn't work e.g. for veth pairs
> where ->dellink() of a veth automatically adds the peer. Therefore if
> @ifindices contains both peers then the first ->dellink() will remove
> the next device from @list_kill. This caused a page fault when
> @list_kill was further iterated on.
> 
> I opted to add a flag to struct net_device as David suggested in order
> to avoid increasing sizeof(struct net_device), but perhaps it's not that
> big of an issue.
> If it's fine then I'll update it.

With the dev->bulk_delete flag and raw array instead of attributes you
can go back to the version of the code which stores dev pointers in a
temp kmalloc'd array, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04 20:40   ` Lahav Schlesinger
  2022-01-04 21:12     ` Jakub Kicinski
@ 2022-01-05  0:18     ` David Ahern
  2022-01-05 16:09       ` David Ahern
  1 sibling, 1 reply; 9+ messages in thread
From: David Ahern @ 2022-01-05  0:18 UTC (permalink / raw)
  To: Lahav Schlesinger, Eric Dumazet
  Cc: netdev, kuba, idosch, nicolas.dichtel, nikolay

On 1/4/22 1:40 PM, Lahav Schlesinger wrote:
> I tried using dev->unreg_list but it doesn't work e.g. for veth pairs
> where ->dellink() of a veth automatically adds the peer. Therefore if
> @ifindices contains both peers then the first ->dellink() will remove
> the next device from @list_kill. This caused a page fault when
> @list_kill was further iterated on.

make sure you add a selftest for the bulk delete and cover cases with
veth, vlan, vrf, dummy, bridge, ...

> 
> I opted to add a flag to struct net_device as David suggested in order
> to avoid increasing sizeof(struct net_device), but perhaps it's not that
> big of an issue.
> If it's fine then I'll update it.

I was hoping to avoid bloating net_device with 16B that has such a
limited need. In one config I use, net_device is 2048B - a nice size and
an additional 16B makes netdevs much more expensive. A ubuntu config
comes in at 2368, so not really an issue there.

Staring at the existing list_head options close_list seems like a
candidate for a union with bulk_kill_list. If that does not work we can
add a new one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04 21:12     ` Jakub Kicinski
@ 2022-01-05  0:19       ` David Ahern
  2022-01-06 19:51         ` Lahav Schlesinger
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2022-01-05  0:19 UTC (permalink / raw)
  To: Jakub Kicinski, Lahav Schlesinger
  Cc: Eric Dumazet, netdev, idosch, nicolas.dichtel, nikolay

On 1/4/22 2:12 PM, Jakub Kicinski wrote:
> With the dev->bulk_delete flag and raw array instead of attributes you
> can go back to the version of the code which stores dev pointers in a
> temp kmalloc'd array, right?

missed this response ... that should work as well once duplicates are
caught.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-05  0:18     ` David Ahern
@ 2022-01-05 16:09       ` David Ahern
  2022-01-06 19:54         ` Lahav Schlesinger
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2022-01-05 16:09 UTC (permalink / raw)
  To: Lahav Schlesinger, Eric Dumazet
  Cc: netdev, kuba, idosch, nicolas.dichtel, nikolay

On 1/4/22 5:18 PM, David Ahern wrote:
> On 1/4/22 1:40 PM, Lahav Schlesinger wrote:
>> I tried using dev->unreg_list but it doesn't work e.g. for veth pairs
>> where ->dellink() of a veth automatically adds the peer. Therefore if
>> @ifindices contains both peers then the first ->dellink() will remove
>> the next device from @list_kill. This caused a page fault when
>> @list_kill was further iterated on.
> 
> make sure you add a selftest for the bulk delete and cover cases with
> veth, vlan, vrf, dummy, bridge, ...
> 

BTW, delete of a netdev clears out neighbor entries, network addresses,
routes, hardware updates, etc. with lots of notifications to userspace.
Bulk delete of 1000s of netdevs is going to end up holding the rtnl for
a "long" time. It would be good for the selftests to include a cases
with lots of neighbor entries, routes, addresses.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-05  0:19       ` David Ahern
@ 2022-01-06 19:51         ` Lahav Schlesinger
  0 siblings, 0 replies; 9+ messages in thread
From: Lahav Schlesinger @ 2022-01-06 19:51 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Eric Dumazet, netdev, idosch, nicolas.dichtel, nikolay

On Tue, Jan 04, 2022 at 05:19:30PM -0700, David Ahern wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On 1/4/22 2:12 PM, Jakub Kicinski wrote:
> > With the dev->bulk_delete flag and raw array instead of attributes you
> > can go back to the version of the code which stores dev pointers in a
> > temp kmalloc'd array, right?
>
> missed this response ... that should work as well once duplicates are
> caught.

I liked the idea that we can avoid increasing sizeof(struct net_device)
for this feature. In this case I see the final implementation as looking
something like this:

dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);

for (i = 0; i < num_devices; i++) {
        dev = __dev_get_by_index(net, ifindices[i]);

        ...

        if (dev->bulk_delete)
                goto cleanup;

        dev->bulk_delete = 1;
        dev_list[i] = dev;
}

for (i = 0; i < num_devices; i++) {
        dev = dev_list[i];
        dev->rtnl_link_ops->dellink(dev, &list_kill);
        dev->bulk_delete = 0;
}

kfree(dev_list);

unregister_netdevice_many(&list_kill);

return 0;

cleanup:
        for (j = 0; j < i; j++) {
                dev_list[j]->bulk_delete = 0;
        }
        kfree(dev_list);


Including the explicit checks in rtnl_setlink() __rtnl_newlink() and
rtnl_getlink() for returning an error if IFLA_IFINDEX_LIST is passed.

If it looks okay I'll send a v7.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-05 16:09       ` David Ahern
@ 2022-01-06 19:54         ` Lahav Schlesinger
  0 siblings, 0 replies; 9+ messages in thread
From: Lahav Schlesinger @ 2022-01-06 19:54 UTC (permalink / raw)
  To: David Ahern; +Cc: Eric Dumazet, netdev, kuba, idosch, nicolas.dichtel, nikolay

On Wed, Jan 05, 2022 at 09:09:34AM -0700, David Ahern wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On 1/4/22 5:18 PM, David Ahern wrote:
> > On 1/4/22 1:40 PM, Lahav Schlesinger wrote:
> >> I tried using dev->unreg_list but it doesn't work e.g. for veth pairs
> >> where ->dellink() of a veth automatically adds the peer. Therefore if
> >> @ifindices contains both peers then the first ->dellink() will remove
> >> the next device from @list_kill. This caused a page fault when
> >> @list_kill was further iterated on.
> >
> > make sure you add a selftest for the bulk delete and cover cases with
> > veth, vlan, vrf, dummy, bridge, ...
> >
>
> BTW, delete of a netdev clears out neighbor entries, network addresses,
> routes, hardware updates, etc. with lots of notifications to userspace.
> Bulk delete of 1000s of netdevs is going to end up holding the rtnl for
> a "long" time. It would be good for the selftests to include a cases
> with lots of neighbor entries, routes, addresses.

Ack. I'll add such tests to v7. What numbers you have in mind for the
number of routes/neighbours etc? I suppose we don't want the tests to
run for extremely long times.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-06 19:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04  8:10 [PATCH net-next v6] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
2022-01-04 14:32 ` Eric Dumazet
2022-01-04 20:40   ` Lahav Schlesinger
2022-01-04 21:12     ` Jakub Kicinski
2022-01-05  0:19       ` David Ahern
2022-01-06 19:51         ` Lahav Schlesinger
2022-01-05  0:18     ` David Ahern
2022-01-05 16:09       ` David Ahern
2022-01-06 19:54         ` Lahav Schlesinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.