All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
@ 2021-12-05  9:36 Lahav Schlesinger
  2021-12-05 10:28 ` Nikolay Aleksandrov
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Lahav Schlesinger @ 2021-12-05  9:36 UTC (permalink / raw)
  To: netdev; +Cc: kuba, dsahern, nikolay

Under large scale, some routers are required to support tens of thousands
of devices at once, both physical and virtual (e.g. loopbacks, tunnels,
vrfs, etc).
At times such routers are required to delete massive amounts of devices
at once, such as when a factory reset is performed on the router (causing
a deletion of all devices), or when a configuration is restored after an
upgrade, or as a request from an operator.

Currently there are 2 means of deleting devices using Netlink:
1. Deleting a single device (either by ifindex using ifinfomsg::ifi_index,
or by name using IFLA_IFNAME)
2. Delete all device that belong to a group (using IFLA_GROUP)

Deletion of devices one-by-one has poor performance on large scale of
devices compared to "group deletion":
After all device are handled, netdev_run_todo() is called which
calls rcu_barrier() to finish any outstanding RCU callbacks that were
registered during the deletion of the device, then wait until the
refcount of all the devices is 0, then perform final cleanups.

However, calling rcu_barrier() is a very costly operation, each call
taking in the order of 10s of milliseconds.

When deleting a large number of device one-by-one, rcu_barrier()
will be called for each device being deleted.
As an example, following benchmark deletes 10K loopback devices,
all of which are UP and with only IPv6 LLA being configured:

1. Deleting one-by-one using 1 thread : 243 seconds
2. Deleting one-by-one using 10 thread: 70 seconds
3. Deleting one-by-one using 50 thread: 54 seconds
4. Deleting all using "group deletion": 30 seconds

Note that even though the deletion logic takes place under the rtnl
lock, since the call to rcu_barrier() is outside the lock we gain
some improvements.

But, while "group deletion" is the fastest, it is not suited for
deleting large number of arbitrary devices which are unknown a head of
time. Furthermore, moving large number of devices to a group is also a
costly operation.

This patch adds support for passing an arbitrary list of ifindex of
devices to delete with a new IFLA_IFINDEX attribute. A single message
may contain multiple instances of this attribute).
This gives a more fine-grained control over which devices to delete,
while still resulting in rcu_barrier() being called only once.
Indeed, the timings of using this new API to delete 10K devices is
the same as using the existing "group" deletion.

Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
---
v4 -> v5
 - Call ->dellink() only once for duplicated devices.

v3 -> v4
 - Change single IFLA_INDEX_LIST into multiple IFLA_IFINDEX
 - Fail if passing both IFLA_GROUP and at least one IFLA_IFNEX

v2 -> v3
 - Rename 'ifindex_list' to 'ifindices', and pass it as int*
 - Clamp 'ops' variable in second loop.

v1 -> v2
 - Unset 'len' of IFLA_IFINDEX_LIST in policy.
 - Use __dev_get_by_index() instead of n^2 loop.
 - Return -ENODEV if any ifindex is not present.
 - Saved devices in an array.
 - Fix formatting.

 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 82 ++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..68fcde9c0c5e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 
+	IFLA_IFINDEX,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index fd030e02f16d..5165cc699d97 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -37,6 +37,7 @@
 #include <linux/pci.h>
 #include <linux/etherdevice.h>
 #include <linux/bpf.h>
+#include <linux/sort.h>
 
 #include <linux/uaccess.h>
 
@@ -1880,6 +1881,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
+	[IFLA_IFINDEX]		= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
 	return 0;
 }
 
+static int dev_ifindex_cmp(const void *a, const void *b)
+{
+	struct net_device * const *dev1 = a, * const *dev2 = b;
+
+	return (*dev1)->ifindex - (*dev2)->ifindex;
+}
+
+static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
+				struct netlink_ext_ack *extack)
+{
+	int i = 0, num_devices = 0, rem;
+	struct net_device **dev_list;
+	const struct nlattr *nla;
+	LIST_HEAD(list_kill);
+	int ret;
+
+	nla_for_each_attr(nla, head, len, rem) {
+		if (nla_type(nla) == IFLA_IFINDEX)
+			num_devices++;
+	}
+
+	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
+	if (!dev_list)
+		return -ENOMEM;
+
+	nla_for_each_attr(nla, head, len, rem) {
+		const struct rtnl_link_ops *ops;
+		struct net_device *dev;
+		int ifindex;
+
+		if (nla_type(nla) != IFLA_IFINDEX)
+			continue;
+
+		ifindex = nla_get_s32(nla);
+		ret = -ENODEV;
+		dev = __dev_get_by_index(net, ifindex);
+		if (!dev) {
+			NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
+			goto out_free;
+		}
+
+		ret = -EOPNOTSUPP;
+		ops = dev->rtnl_link_ops;
+		if (!ops || !ops->dellink) {
+			NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
+			goto out_free;
+		}
+
+		dev_list[i++] = dev;
+	}
+
+	/* Sort devices, so we could skip duplicates */
+	sort(dev_list, num_devices, sizeof(*dev_list), dev_ifindex_cmp, NULL);
+
+	for (i = 0; i < num_devices; i++) {
+		struct net_device *dev = dev_list[i];
+
+		if (i != 0 && dev_list[i - 1]->ifindex == dev->ifindex)
+			continue;
+
+		dev->rtnl_link_ops->dellink(dev, &list_kill);
+	}
+
+	unregister_netdevice_many(&list_kill);
+
+	ret = 0;
+
+out_free:
+	kfree(dev_list);
+	return ret;
+}
+
 int rtnl_delete_link(struct net_device *dev)
 {
 	const struct rtnl_link_ops *ops;
@@ -3093,6 +3167,11 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			return PTR_ERR(tgt_net);
 	}
 
+	if (tb[IFLA_GROUP] && tb[IFLA_IFINDEX]) {
+		NL_SET_ERR_MSG(extack, "Can't pass both IFLA_GROUP and IFLA_IFINDEX");
+		return -EOPNOTSUPP;
+	}
+
 	err = -EINVAL;
 	ifm = nlmsg_data(nlh);
 	if (ifm->ifi_index > 0)
@@ -3102,6 +3181,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
 				   tb[IFLA_ALT_IFNAME], NULL);
 	else if (tb[IFLA_GROUP])
 		err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
+	else if (tb[IFLA_IFINDEX])
+		err = rtnl_ifindex_dellink(tgt_net, nlmsg_attrdata(nlh, sizeof(*ifm)),
+					   nlmsg_attrlen(nlh, sizeof(*ifm)), extack);
 	else
 		goto out;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-05  9:36 [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
@ 2021-12-05 10:28 ` Nikolay Aleksandrov
  2021-12-06  8:25 ` Nicolas Dichtel
  2021-12-07  4:12 ` David Ahern
  2 siblings, 0 replies; 18+ messages in thread
From: Nikolay Aleksandrov @ 2021-12-05 10:28 UTC (permalink / raw)
  To: Lahav Schlesinger, netdev; +Cc: kuba, dsahern, Ido Schimmel

On 05/12/2021 11:36, Lahav Schlesinger wrote:
[snip]
> But, while "group deletion" is the fastest, it is not suited for
> deleting large number of arbitrary devices which are unknown a head of
> time. Furthermore, moving large number of devices to a group is also a
> costly operation.
> 
> This patch adds support for passing an arbitrary list of ifindex of
> devices to delete with a new IFLA_IFINDEX attribute. A single message
> may contain multiple instances of this attribute).
> This gives a more fine-grained control over which devices to delete,
> while still resulting in rcu_barrier() being called only once.
> Indeed, the timings of using this new API to delete 10K devices is
> the same as using the existing "group" deletion.
> 
> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> ---
[snip]

Note that Ido raised a good point[1] about group operations and their efficiency
as a reply to v4 of the patch.

+CC Ido

[1] https://marc.info/?l=linux-netdev&m=163869796129581&w=2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-05  9:36 [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
  2021-12-05 10:28 ` Nikolay Aleksandrov
@ 2021-12-06  8:25 ` Nicolas Dichtel
  2021-12-06 15:20   ` David Ahern
  2021-12-07 12:48   ` Lahav Schlesinger
  2021-12-07  4:12 ` David Ahern
  2 siblings, 2 replies; 18+ messages in thread
From: Nicolas Dichtel @ 2021-12-06  8:25 UTC (permalink / raw)
  To: Lahav Schlesinger, netdev; +Cc: kuba, dsahern, nikolay

Le 05/12/2021 à 10:36, Lahav Schlesinger a écrit :
Some comments below, but please, keep the people who replied to previous
versions of this patch in cc.

[snip]

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index eebd3894fe89..68fcde9c0c5e 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -348,6 +348,7 @@ enum {
>  	IFLA_PARENT_DEV_NAME,
>  	IFLA_PARENT_DEV_BUS_NAME,
>  
> +	IFLA_IFINDEX,
nit: maybe the previous blank line sit better after this new attribute (and
before __IFLA_MAX)?

>  	__IFLA_MAX
>  };
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index fd030e02f16d..5165cc699d97 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -37,6 +37,7 @@
>  #include <linux/pci.h>
>  #include <linux/etherdevice.h>
>  #include <linux/bpf.h>
> +#include <linux/sort.h>
>  
>  #include <linux/uaccess.h>
>  
> @@ -1880,6 +1881,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
>  	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
>  	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
> +	[IFLA_IFINDEX]		= { .type = NLA_S32 },
Same policy than IFLA_NEW_IFINDEX to refuse negative ifindex.

>  };
>  
>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
>  	return 0;
>  }
>  
> +static int dev_ifindex_cmp(const void *a, const void *b)
> +{
> +	struct net_device * const *dev1 = a, * const *dev2 = b;
> +
> +	return (*dev1)->ifindex - (*dev2)->ifindex;
> +}
> +
> +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
> +				struct netlink_ext_ack *extack)
> +{
> +	int i = 0, num_devices = 0, rem;
> +	struct net_device **dev_list;
> +	const struct nlattr *nla;
> +	LIST_HEAD(list_kill);
> +	int ret;
> +
> +	nla_for_each_attr(nla, head, len, rem) {
> +		if (nla_type(nla) == IFLA_IFINDEX)
> +			num_devices++;
> +	}
> +
> +	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> +	if (!dev_list)
> +		return -ENOMEM;
> +
> +	nla_for_each_attr(nla, head, len, rem) {
> +		const struct rtnl_link_ops *ops;
> +		struct net_device *dev;
> +		int ifindex;
> +
> +		if (nla_type(nla) != IFLA_IFINDEX)
> +			continue;
> +
> +		ifindex = nla_get_s32(nla);
> +		ret = -ENODEV;
> +		dev = __dev_get_by_index(net, ifindex);
> +		if (!dev) {
> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
It would be nice to have the ifindex in the error message. This message does not
give more information than "ENODEV".

> +			goto out_free;
> +		}
> +
> +		ret = -EOPNOTSUPP;
> +		ops = dev->rtnl_link_ops;
> +		if (!ops || !ops->dellink) {
> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
Same here.


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-06  8:25 ` Nicolas Dichtel
@ 2021-12-06 15:20   ` David Ahern
  2021-12-07  8:27     ` Nicolas Dichtel
  2021-12-07 12:48   ` Lahav Schlesinger
  1 sibling, 1 reply; 18+ messages in thread
From: David Ahern @ 2021-12-06 15:20 UTC (permalink / raw)
  To: nicolas.dichtel, Lahav Schlesinger, netdev; +Cc: kuba, nikolay

On 12/6/21 1:25 AM, Nicolas Dichtel wrote:
>> @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
>>  	return 0;
>>  }
>>  
>> +static int dev_ifindex_cmp(const void *a, const void *b)
>> +{
>> +	struct net_device * const *dev1 = a, * const *dev2 = b;
>> +
>> +	return (*dev1)->ifindex - (*dev2)->ifindex;
>> +}
>> +
>> +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
>> +				struct netlink_ext_ack *extack)
>> +{
>> +	int i = 0, num_devices = 0, rem;
>> +	struct net_device **dev_list;
>> +	const struct nlattr *nla;
>> +	LIST_HEAD(list_kill);
>> +	int ret;
>> +
>> +	nla_for_each_attr(nla, head, len, rem) {
>> +		if (nla_type(nla) == IFLA_IFINDEX)
>> +			num_devices++;
>> +	}
>> +
>> +	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
>> +	if (!dev_list)
>> +		return -ENOMEM;
>> +
>> +	nla_for_each_attr(nla, head, len, rem) {
>> +		const struct rtnl_link_ops *ops;
>> +		struct net_device *dev;
>> +		int ifindex;
>> +
>> +		if (nla_type(nla) != IFLA_IFINDEX)
>> +			continue;
>> +
>> +		ifindex = nla_get_s32(nla);
>> +		ret = -ENODEV;
>> +		dev = __dev_get_by_index(net, ifindex);
>> +		if (!dev) {
>> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
> It would be nice to have the ifindex in the error message. This message does not
> give more information than "ENODEV".

extack infra does not allow dynamic messages. It does point to the
location of the bad index, so userspace could figure it out.

> 
>> +			goto out_free;
>> +		}
>> +
>> +		ret = -EOPNOTSUPP;
>> +		ops = dev->rtnl_link_ops;
>> +		if (!ops || !ops->dellink) {
>> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
> Same here.
> 
> 
> Thank you,
> Nicolas
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-05  9:36 [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
  2021-12-05 10:28 ` Nikolay Aleksandrov
  2021-12-06  8:25 ` Nicolas Dichtel
@ 2021-12-07  4:12 ` David Ahern
  2021-12-07 15:37   ` Jakub Kicinski
  2021-12-08 21:47   ` Lahav Schlesinger
  2 siblings, 2 replies; 18+ messages in thread
From: David Ahern @ 2021-12-07  4:12 UTC (permalink / raw)
  To: Lahav Schlesinger, netdev; +Cc: kuba, nikolay

On 12/5/21 2:36 AM, Lahav Schlesinger wrote:
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index fd030e02f16d..5165cc699d97 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -37,6 +37,7 @@
>  #include <linux/pci.h>
>  #include <linux/etherdevice.h>
>  #include <linux/bpf.h>
> +#include <linux/sort.h>
>  
>  #include <linux/uaccess.h>
>  
> @@ -1880,6 +1881,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  	[IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
>  	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
>  	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
> +	[IFLA_IFINDEX]		= { .type = NLA_S32 },

you need to make sure this new attribute can not be used in setlink
requests or getlink.

>  };
>  
>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
>  	return 0;
>  }
>  
> +static int dev_ifindex_cmp(const void *a, const void *b)
> +{
> +	struct net_device * const *dev1 = a, * const *dev2 = b;

const struct net_device *dev1 = 1, *dev2 = b;

> +
> +	return (*dev1)->ifindex - (*dev2)->ifindex;
> +}
> +
> +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
> +				struct netlink_ext_ack *extack)
> +{
> +	int i = 0, num_devices = 0, rem;
> +	struct net_device **dev_list;
> +	const struct nlattr *nla;
> +	LIST_HEAD(list_kill);
> +	int ret;
> +
> +	nla_for_each_attr(nla, head, len, rem) {
> +		if (nla_type(nla) == IFLA_IFINDEX)
> +			num_devices++;
> +	}


The need to walk the list twice (3 really with the sort) to means the
array solution is better.

> +
> +	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> +	if (!dev_list)
> +		return -ENOMEM;
> +
> +	nla_for_each_attr(nla, head, len, rem) {
> +		const struct rtnl_link_ops *ops;
> +		struct net_device *dev;
> +		int ifindex;
> +
> +		if (nla_type(nla) != IFLA_IFINDEX)
> +			continue;

and this business too. We have arrays in other places
(net/ipv4/nexthop.c), so this is not the first.


> +
> +		ifindex = nla_get_s32(nla);
> +		ret = -ENODEV;
> +		dev = __dev_get_by_index(net, ifindex);
> +		if (!dev) {
> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
> +			goto out_free;
> +		}
> +
> +		ret = -EOPNOTSUPP;
> +		ops = dev->rtnl_link_ops;
> +		if (!ops || !ops->dellink) {
> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
> +			goto out_free;
> +		}
> +
> +		dev_list[i++] = dev;
> +	}
> +
> +	/* Sort devices, so we could skip duplicates */
> +	sort(dev_list, num_devices, sizeof(*dev_list), dev_ifindex_cmp, NULL);

how did this sort change the results? 10k compares and re-order has to
add some overhead.

> +
> +	for (i = 0; i < num_devices; i++) {
> +		struct net_device *dev = dev_list[i];
> +
> +		if (i != 0 && dev_list[i - 1]->ifindex == dev->ifindex)

		if (i && ...)


I liked the array variant better. Jakub?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-06 15:20   ` David Ahern
@ 2021-12-07  8:27     ` Nicolas Dichtel
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dichtel @ 2021-12-07  8:27 UTC (permalink / raw)
  To: David Ahern, Lahav Schlesinger, netdev; +Cc: kuba, nikolay

Le 06/12/2021 à 16:20, David Ahern a écrit :
> On 12/6/21 1:25 AM, Nicolas Dichtel wrote:
>>> @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
>>>  	return 0;
>>>  }
>>>  
>>> +static int dev_ifindex_cmp(const void *a, const void *b)
>>> +{
>>> +	struct net_device * const *dev1 = a, * const *dev2 = b;
>>> +
>>> +	return (*dev1)->ifindex - (*dev2)->ifindex;
>>> +}
>>> +
>>> +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
>>> +				struct netlink_ext_ack *extack)
>>> +{
>>> +	int i = 0, num_devices = 0, rem;
>>> +	struct net_device **dev_list;
>>> +	const struct nlattr *nla;
>>> +	LIST_HEAD(list_kill);
>>> +	int ret;
>>> +
>>> +	nla_for_each_attr(nla, head, len, rem) {
>>> +		if (nla_type(nla) == IFLA_IFINDEX)
>>> +			num_devices++;
>>> +	}
>>> +
>>> +	dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
>>> +	if (!dev_list)
>>> +		return -ENOMEM;
>>> +
>>> +	nla_for_each_attr(nla, head, len, rem) {
>>> +		const struct rtnl_link_ops *ops;
>>> +		struct net_device *dev;
>>> +		int ifindex;
>>> +
>>> +		if (nla_type(nla) != IFLA_IFINDEX)
>>> +			continue;
>>> +
>>> +		ifindex = nla_get_s32(nla);
>>> +		ret = -ENODEV;
>>> +		dev = __dev_get_by_index(net, ifindex);
>>> +		if (!dev) {
>>> +			NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
>> It would be nice to have the ifindex in the error message. This message does not
>> give more information than "ENODEV".
> 
> extack infra does not allow dynamic messages. It does point to the
> location of the bad index, so userspace could figure it out.
Oh yes, my bad.


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-06  8:25 ` Nicolas Dichtel
  2021-12-06 15:20   ` David Ahern
@ 2021-12-07 12:48   ` Lahav Schlesinger
  2021-12-08  7:44     ` Nicolas Dichtel
  1 sibling, 1 reply; 18+ messages in thread
From: Lahav Schlesinger @ 2021-12-07 12:48 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, kuba, dsahern, nikolay

On Mon, Dec 06, 2021 at 09:25:17AM +0100, Nicolas Dichtel wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> Le 05/12/2021 à 10:36, Lahav Schlesinger a écrit :
> Some comments below, but please, keep the people who replied to previous
> versions of this patch in cc.
>
> [snip]
>
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index eebd3894fe89..68fcde9c0c5e 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -348,6 +348,7 @@ enum {
> >       IFLA_PARENT_DEV_NAME,
> >       IFLA_PARENT_DEV_BUS_NAME,
> >
> > +     IFLA_IFINDEX,
> nit: maybe the previous blank line sit better after this new attribute (and
> before __IFLA_MAX)?

Due to the comment above the previous 2 attributes, I think that by
removing this empty line it can be accidentally thought as if the new
attribute is part of this "block".
As for adding a new line before __IFLA_MAX, I wanted to preserve the
appearance we had before the IFLA_PARENT_DEV_xxx attributes where added,
where there was no empty line before __IFLA_MAX.

I don't mind either way though, whatever looks better to you.

>
> >       __IFLA_MAX
> >  };
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index fd030e02f16d..5165cc699d97 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/pci.h>
> >  #include <linux/etherdevice.h>
> >  #include <linux/bpf.h>
> > +#include <linux/sort.h>
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -1880,6 +1881,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > +     [IFLA_IFINDEX]          = { .type = NLA_S32 },
> Same policy than IFLA_NEW_IFINDEX to refuse negative ifindex.

Right, thanks

>
> >  };
> >
> >  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
> >       return 0;
> >  }
> >
> > +static int dev_ifindex_cmp(const void *a, const void *b)
> > +{
> > +     struct net_device * const *dev1 = a, * const *dev2 = b;
> > +
> > +     return (*dev1)->ifindex - (*dev2)->ifindex;
> > +}
> > +
> > +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
> > +                             struct netlink_ext_ack *extack)
> > +{
> > +     int i = 0, num_devices = 0, rem;
> > +     struct net_device **dev_list;
> > +     const struct nlattr *nla;
> > +     LIST_HEAD(list_kill);
> > +     int ret;
> > +
> > +     nla_for_each_attr(nla, head, len, rem) {
> > +             if (nla_type(nla) == IFLA_IFINDEX)
> > +                     num_devices++;
> > +     }
> > +
> > +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> > +     if (!dev_list)
> > +             return -ENOMEM;
> > +
> > +     nla_for_each_attr(nla, head, len, rem) {
> > +             const struct rtnl_link_ops *ops;
> > +             struct net_device *dev;
> > +             int ifindex;
> > +
> > +             if (nla_type(nla) != IFLA_IFINDEX)
> > +                     continue;
> > +
> > +             ifindex = nla_get_s32(nla);
> > +             ret = -ENODEV;
> > +             dev = __dev_get_by_index(net, ifindex);
> > +             if (!dev) {
> > +                     NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
> It would be nice to have the ifindex in the error message. This message does not
> give more information than "ENODEV".
>
> > +                     goto out_free;
> > +             }
> > +
> > +             ret = -EOPNOTSUPP;
> > +             ops = dev->rtnl_link_ops;
> > +             if (!ops || !ops->dellink) {
> > +                     NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
> Same here.
>
>
> Thank you,
> Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-07  4:12 ` David Ahern
@ 2021-12-07 15:37   ` Jakub Kicinski
  2021-12-08 21:50     ` Lahav Schlesinger
  2021-12-08 21:47   ` Lahav Schlesinger
  1 sibling, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-12-07 15:37 UTC (permalink / raw)
  To: David Ahern; +Cc: Lahav Schlesinger, netdev, nikolay

On Mon, 6 Dec 2021 21:12:33 -0700 David Ahern wrote:
> The need to walk the list twice (3 really with the sort) to means the
> array solution is better.

Technically the linear walk will be dwarfed by the sort for non-trivial
inputs (have I been conducting too many coding interviews?) 

Especially with retpolines :(

> I liked the array variant better. Jakub?

I always default to attr-per-member unless there is a strong constraint
on message size. Trying to extend old school netlink (TC etc) left me
scarred. But we can do array, no big deal.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-07 12:48   ` Lahav Schlesinger
@ 2021-12-08  7:44     ` Nicolas Dichtel
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dichtel @ 2021-12-08  7:44 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: netdev, kuba, dsahern, nikolay

Le 07/12/2021 à 13:48, Lahav Schlesinger a écrit :
> On Mon, Dec 06, 2021 at 09:25:17AM +0100, Nicolas Dichtel wrote:
>> CAUTION: External E-Mail - Use caution with links and attachments
>>
>>
>> Le 05/12/2021 à 10:36, Lahav Schlesinger a écrit :
>> Some comments below, but please, keep the people who replied to previous
>> versions of this patch in cc.
>>
>> [snip]
>>
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index eebd3894fe89..68fcde9c0c5e 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -348,6 +348,7 @@ enum {
>>>       IFLA_PARENT_DEV_NAME,
>>>       IFLA_PARENT_DEV_BUS_NAME,
>>>
>>> +     IFLA_IFINDEX,
>> nit: maybe the previous blank line sit better after this new attribute (and
>> before __IFLA_MAX)?
> 
> Due to the comment above the previous 2 attributes, I think that by
> removing this empty line it can be accidentally thought as if the new
> attribute is part of this "block".
> As for adding a new line before __IFLA_MAX, I wanted to preserve the
> appearance we had before the IFLA_PARENT_DEV_xxx attributes where added,
> where there was no empty line before __IFLA_MAX.
Good point.

> 
> I don't mind either way though, whatever looks better to you.
Ok, forget my comment.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-07  4:12 ` David Ahern
  2021-12-07 15:37   ` Jakub Kicinski
@ 2021-12-08 21:47   ` Lahav Schlesinger
  2021-12-08 23:43     ` David Ahern
  1 sibling, 1 reply; 18+ messages in thread
From: Lahav Schlesinger @ 2021-12-08 21:47 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, kuba, nikolay

On Mon, Dec 06, 2021 at 09:12:33PM -0700, David Ahern wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On 12/5/21 2:36 AM, Lahav Schlesinger wrote:
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index fd030e02f16d..5165cc699d97 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -37,6 +37,7 @@
> >  #include <linux/pci.h>
> >  #include <linux/etherdevice.h>
> >  #include <linux/bpf.h>
> > +#include <linux/sort.h>
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -1880,6 +1881,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > +     [IFLA_IFINDEX]          = { .type = NLA_S32 },
>
> you need to make sure this new attribute can not be used in setlink
> requests or getlink.

Right, I'll add it.

>
> >  };
> >
> >  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > @@ -3050,6 +3052,78 @@ static int rtnl_group_dellink(const struct net *net, int group)
> >       return 0;
> >  }
> >
> > +static int dev_ifindex_cmp(const void *a, const void *b)
> > +{
> > +     struct net_device * const *dev1 = a, * const *dev2 = b;
>
> const struct net_device *dev1 = 1, *dev2 = b;

The array stores 'struct net_device *', and 'a' and 'b' are pointers to
elements in this array (so 'struct net_device **', ignoring constness)

>
> > +
> > +     return (*dev1)->ifindex - (*dev2)->ifindex;
> > +}
> > +
> > +static int rtnl_ifindex_dellink(struct net *net, struct nlattr *head, int len,
> > +                             struct netlink_ext_ack *extack)
> > +{
> > +     int i = 0, num_devices = 0, rem;
> > +     struct net_device **dev_list;
> > +     const struct nlattr *nla;
> > +     LIST_HEAD(list_kill);
> > +     int ret;
> > +
> > +     nla_for_each_attr(nla, head, len, rem) {
> > +             if (nla_type(nla) == IFLA_IFINDEX)
> > +                     num_devices++;
> > +     }
>
>
> The need to walk the list twice (3 really with the sort) to means the
> array solution is better.
>
> > +
> > +     dev_list = kmalloc_array(num_devices, sizeof(*dev_list), GFP_KERNEL);
> > +     if (!dev_list)
> > +             return -ENOMEM;
> > +
> > +     nla_for_each_attr(nla, head, len, rem) {
> > +             const struct rtnl_link_ops *ops;
> > +             struct net_device *dev;
> > +             int ifindex;
> > +
> > +             if (nla_type(nla) != IFLA_IFINDEX)
> > +                     continue;
>
> and this business too. We have arrays in other places
> (net/ipv4/nexthop.c), so this is not the first.
>
>
> > +
> > +             ifindex = nla_get_s32(nla);
> > +             ret = -ENODEV;
> > +             dev = __dev_get_by_index(net, ifindex);
> > +             if (!dev) {
> > +                     NL_SET_ERR_MSG_ATTR(extack, nla, "Unknown ifindex");
> > +                     goto out_free;
> > +             }
> > +
> > +             ret = -EOPNOTSUPP;
> > +             ops = dev->rtnl_link_ops;
> > +             if (!ops || !ops->dellink) {
> > +                     NL_SET_ERR_MSG_ATTR(extack, nla, "Device cannot be deleted");
> > +                     goto out_free;
> > +             }
> > +
> > +             dev_list[i++] = dev;
> > +     }
> > +
> > +     /* Sort devices, so we could skip duplicates */
> > +     sort(dev_list, num_devices, sizeof(*dev_list), dev_ifindex_cmp, NULL);
>
> how did this sort change the results? 10k compares and re-order has to
> add some overhead.

No visible changes from what I saw, this API is as fast as group
deletion. Maybe a few tens of milliseconds slower, but it's lost in the
noise.
I'll run more thorough benchmarks to get to a more conclusive conclusion.

Also just pointing out that the sort will be needed even if we pass an
array (IFLA_IFINDEX_LIST) instead.
Feels like CS 101, but do you have a better approach for detecting
duplicates in an array? I imagine a hash table will be slower as it will
need to allocate a node object for each device (assuming we don't want
to add a new hlist_node to 'struct net_device' just for this)

>
> > +
> > +     for (i = 0; i < num_devices; i++) {
> > +             struct net_device *dev = dev_list[i];
> > +
> > +             if (i != 0 && dev_list[i - 1]->ifindex == dev->ifindex)
>
>                 if (i && ...)
>
>
> I liked the array variant better. Jakub?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-07 15:37   ` Jakub Kicinski
@ 2021-12-08 21:50     ` Lahav Schlesinger
  0 siblings, 0 replies; 18+ messages in thread
From: Lahav Schlesinger @ 2021-12-08 21:50 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Ahern, netdev, nikolay

On Tue, Dec 07, 2021 at 07:37:15AM -0800, Jakub Kicinski wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On Mon, 6 Dec 2021 21:12:33 -0700 David Ahern wrote:
> > The need to walk the list twice (3 really with the sort) to means the
> > array solution is better.
>
> Technically the linear walk will be dwarfed by the sort for non-trivial
> inputs (have I been conducting too many coding interviews?)
>
> Especially with retpolines :(
>
> > I liked the array variant better. Jakub?
>
> I always default to attr-per-member unless there is a strong constraint
> on message size. Trying to extend old school netlink (TC etc) left me
> scarred. But we can do array, no big deal.

Thanks, I'll send a new patch with array API and all previous comments

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-08 21:47   ` Lahav Schlesinger
@ 2021-12-08 23:43     ` David Ahern
  2021-12-09  0:04       ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2021-12-08 23:43 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: netdev, kuba, nikolay

On 12/8/21 2:47 PM, Lahav Schlesinger wrote:
> No visible changes from what I saw, this API is as fast as group
> deletion. Maybe a few tens of milliseconds slower, but it's lost in the
> noise.
> I'll run more thorough benchmarks to get to a more conclusive conclusion.
> 
> Also just pointing out that the sort will be needed even if we pass an
> array (IFLA_IFINDEX_LIST) instead.
> Feels like CS 101, but do you have a better approach for detecting
> duplicates in an array? I imagine a hash table will be slower as it will
> need to allocate a node object for each device (assuming we don't want
> to add a new hlist_node to 'struct net_device' just for this)

I think marking the dev's and then using a delete loop is going to be
the better approach - avoid the sort and duplicate problem. I use that
approach for nexthop deletes:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/nexthop.c#n1849

Find a hole in net_device struct in an area used only for control path
and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
and delete them on another.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-08 23:43     ` David Ahern
@ 2021-12-09  0:04       ` Jakub Kicinski
  2021-12-09  0:18         ` David Ahern
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-12-09  0:04 UTC (permalink / raw)
  To: David Ahern; +Cc: Lahav Schlesinger, netdev, nikolay

On Wed, 8 Dec 2021 16:43:28 -0700 David Ahern wrote:
> On 12/8/21 2:47 PM, Lahav Schlesinger wrote:
> > No visible changes from what I saw, this API is as fast as group
> > deletion. Maybe a few tens of milliseconds slower, but it's lost in the
> > noise.
> > I'll run more thorough benchmarks to get to a more conclusive conclusion.
> > 
> > Also just pointing out that the sort will be needed even if we pass an
> > array (IFLA_IFINDEX_LIST) instead.
> > Feels like CS 101, but do you have a better approach for detecting
> > duplicates in an array? I imagine a hash table will be slower as it will
> > need to allocate a node object for each device (assuming we don't want
> > to add a new hlist_node to 'struct net_device' just for this)  
> 
> I think marking the dev's and then using a delete loop is going to be
> the better approach - avoid the sort and duplicate problem. I use that
> approach for nexthop deletes:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/nexthop.c#n1849
> 
> Find a hole in net_device struct in an area used only for control path
> and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
> and delete them on another.

If we want to keep state in the netdev itself we can probably piggy
back on dev->unreg_list. It should be initialized to empty and not
touched unless device goes thru unregister.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-09  0:04       ` Jakub Kicinski
@ 2021-12-09  0:18         ` David Ahern
  2021-12-09  0:45           ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2021-12-09  0:18 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Lahav Schlesinger, netdev, nikolay

On 12/8/21 5:04 PM, Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 16:43:28 -0700 David Ahern wrote:
>> On 12/8/21 2:47 PM, Lahav Schlesinger wrote:
>>> No visible changes from what I saw, this API is as fast as group
>>> deletion. Maybe a few tens of milliseconds slower, but it's lost in the
>>> noise.
>>> I'll run more thorough benchmarks to get to a more conclusive conclusion.
>>>
>>> Also just pointing out that the sort will be needed even if we pass an
>>> array (IFLA_IFINDEX_LIST) instead.
>>> Feels like CS 101, but do you have a better approach for detecting
>>> duplicates in an array? I imagine a hash table will be slower as it will
>>> need to allocate a node object for each device (assuming we don't want
>>> to add a new hlist_node to 'struct net_device' just for this)  
>>
>> I think marking the dev's and then using a delete loop is going to be
>> the better approach - avoid the sort and duplicate problem. I use that
>> approach for nexthop deletes:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/nexthop.c#n1849
>>
>> Find a hole in net_device struct in an area used only for control path
>> and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
>> and delete them on another.
> 
> If we want to keep state in the netdev itself we can probably piggy
> back on dev->unreg_list. It should be initialized to empty and not
> touched unless device goes thru unregister.
> 

isn't that used when the delink function calls unregister_netdevice_queue?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-09  0:18         ` David Ahern
@ 2021-12-09  0:45           ` Jakub Kicinski
  2021-12-09  0:47             ` David Ahern
  2022-01-04  8:21             ` Lahav Schlesinger
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2021-12-09  0:45 UTC (permalink / raw)
  To: David Ahern; +Cc: Lahav Schlesinger, netdev, nikolay

On Wed, 8 Dec 2021 17:18:48 -0700 David Ahern wrote:
> On 12/8/21 5:04 PM, Jakub Kicinski wrote:
> >> I think marking the dev's and then using a delete loop is going to be
> >> the better approach - avoid the sort and duplicate problem. I use that
> >> approach for nexthop deletes:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/nexthop.c#n1849
> >>
> >> Find a hole in net_device struct in an area used only for control path
> >> and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
> >> and delete them on another.  
> > 
> > If we want to keep state in the netdev itself we can probably piggy
> > back on dev->unreg_list. It should be initialized to empty and not
> > touched unless device goes thru unregister.
> 
> isn't that used when the delink function calls unregister_netdevice_queue?

Sure but all the validation is before we start calling delink, so
doesn't matter?

list to_kill, queued;

for_each_attr(nest) {
	...

	dev = get_by_index(nla_get_s32(..));
	if (!dev)
		goto cleanup;
	if (!list_empty(&dev->unreg_list))
		goto cleanup;
	...

	list_add(&to_kill, &dev->unreg_list);
}

for_each_entry_safe(to_kill) {
	list_del_init(&dev->unreg_list);
	->dellink(dev, queued);
}

unreg_many(queued);

return

cleanup:
	for_each_entry_safe(to_kill) {
		list_del_init(&dev->unreg_list);
	}

No?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-09  0:45           ` Jakub Kicinski
@ 2021-12-09  0:47             ` David Ahern
  2022-01-04  8:21             ` Lahav Schlesinger
  1 sibling, 0 replies; 18+ messages in thread
From: David Ahern @ 2021-12-09  0:47 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Lahav Schlesinger, netdev, nikolay

On 12/8/21 5:45 PM, Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 17:18:48 -0700 David Ahern wrote:
>> On 12/8/21 5:04 PM, Jakub Kicinski wrote:
>>>> I think marking the dev's and then using a delete loop is going to be
>>>> the better approach - avoid the sort and duplicate problem. I use that
>>>> approach for nexthop deletes:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/nexthop.c#n1849
>>>>
>>>> Find a hole in net_device struct in an area used only for control path
>>>> and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
>>>> and delete them on another.  
>>>
>>> If we want to keep state in the netdev itself we can probably piggy
>>> back on dev->unreg_list. It should be initialized to empty and not
>>> touched unless device goes thru unregister.
>>
>> isn't that used when the delink function calls unregister_netdevice_queue?
> 
> Sure but all the validation is before we start calling delink, so
> doesn't matter?
> 
> list to_kill, queued;
> 
> for_each_attr(nest) {
> 	...
> 
> 	dev = get_by_index(nla_get_s32(..));
> 	if (!dev)
> 		goto cleanup;
> 	if (!list_empty(&dev->unreg_list))
> 		goto cleanup;
> 	...
> 
> 	list_add(&to_kill, &dev->unreg_list);
> }
> 
> for_each_entry_safe(to_kill) {
> 	list_del_init(&dev->unreg_list);

sure my mental model was walking the list and not doing that part. :-)


> 	->dellink(dev, queued);
> }
> 
> unreg_many(queued);
> 
> return
> 
> cleanup:
> 	for_each_entry_safe(to_kill) {
> 		list_del_init(&dev->unreg_list);
> 	}
> 
> No?
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2021-12-09  0:45           ` Jakub Kicinski
  2021-12-09  0:47             ` David Ahern
@ 2022-01-04  8:21             ` Lahav Schlesinger
  2022-01-04 14:55               ` Jakub Kicinski
  1 sibling, 1 reply; 18+ messages in thread
From: Lahav Schlesinger @ 2022-01-04  8:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Ahern, netdev, nikolay

On Wed, Dec 08, 2021 at 04:45:44PM -0800, Jakub Kicinski wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On Wed, 8 Dec 2021 17:18:48 -0700 David Ahern wrote:
> > On 12/8/21 5:04 PM, Jakub Kicinski wrote:
> > >> I think marking the dev's and then using a delete loop is going to be
> > >> the better approach - avoid the sort and duplicate problem. I use that
> > >> approach for nexthop deletes:
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_netdev_net-2Dnext.git_tree_net_ipv4_nexthop.c-23n1849&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=-WbGKi6-8WU3IfnrkySymtcZWJMn_aVdolIPebPZGu4&m=td9sv1_NnALM3QjLFg8h9u1P7DUJJL2YhzqyneZ95Gc&s=nEGgvxCwd3MrSXWhv-5Fo7mUlBkuKbHRBAO4VO9mG2o&e=
> > >>
> > >> Find a hole in net_device struct in an area used only for control path
> > >> and add 'bool grp_delete' (or a 1-bit hole). Mark the devices on pass
> > >> and delete them on another.
> > >
> > > If we want to keep state in the netdev itself we can probably piggy
> > > back on dev->unreg_list. It should be initialized to empty and not
> > > touched unless device goes thru unregister.
> >
> > isn't that used when the delink function calls unregister_netdevice_queue?
>
> Sure but all the validation is before we start calling delink, so
> doesn't matter?
>
> list to_kill, queued;
>
> for_each_attr(nest) {
>         ...
>
>         dev = get_by_index(nla_get_s32(..));
>         if (!dev)
>                 goto cleanup;
>         if (!list_empty(&dev->unreg_list))
>                 goto cleanup;
>         ...
>
>         list_add(&to_kill, &dev->unreg_list);
> }
>
> for_each_entry_safe(to_kill) {
>         list_del_init(&dev->unreg_list);
>         ->dellink(dev, queued);
> }
>
> unreg_many(queued);
>
> return
>
> cleanup:
>         for_each_entry_safe(to_kill) {
>                 list_del_init(&dev->unreg_list);
>         }
>
> No?

Hi Jakub, I just sent a V6 for this patch. Just wanted to note that I
tried this approach of using unreg_list, but it caused issues with e.g.
veth deletion - the dellink() of a veth peer automatically adds the
other peer to the queued list. So if the list of ifindices contains
both peers then when 'to_kill' is iterated, both the current device
and the next pointer will have their 'unreg_list' moved to the other
list, which later raised a page fault when 'to_kill' was further
iterated upon.

Therefore instead I added a big flag in a "hole" inside struct
net_device, as David suggested.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion
  2022-01-04  8:21             ` Lahav Schlesinger
@ 2022-01-04 14:55               ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2022-01-04 14:55 UTC (permalink / raw)
  To: Lahav Schlesinger; +Cc: David Ahern, netdev, nikolay

On Tue, 4 Jan 2022 10:21:21 +0200 Lahav Schlesinger wrote:
> Hi Jakub, I just sent a V6 for this patch. Just wanted to note that I
> tried this approach of using unreg_list, but it caused issues with e.g.
> veth deletion - the dellink() of a veth peer automatically adds the
> other peer to the queued list. So if the list of ifindices contains
> both peers then when 'to_kill' is iterated, both the current device
> and the next pointer will have their 'unreg_list' moved to the other
> list, which later raised a page fault when 'to_kill' was further
> iterated upon.
> 
> Therefore instead I added a big flag in a "hole" inside struct
> net_device, as David suggested.

Ack. Thanks for trying it out.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-04 14:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-05  9:36 [PATCH net-next v5] rtnetlink: Support fine-grained netdevice bulk deletion Lahav Schlesinger
2021-12-05 10:28 ` Nikolay Aleksandrov
2021-12-06  8:25 ` Nicolas Dichtel
2021-12-06 15:20   ` David Ahern
2021-12-07  8:27     ` Nicolas Dichtel
2021-12-07 12:48   ` Lahav Schlesinger
2021-12-08  7:44     ` Nicolas Dichtel
2021-12-07  4:12 ` David Ahern
2021-12-07 15:37   ` Jakub Kicinski
2021-12-08 21:50     ` Lahav Schlesinger
2021-12-08 21:47   ` Lahav Schlesinger
2021-12-08 23:43     ` David Ahern
2021-12-09  0:04       ` Jakub Kicinski
2021-12-09  0:18         ` David Ahern
2021-12-09  0:45           ` Jakub Kicinski
2021-12-09  0:47             ` David Ahern
2022-01-04  8:21             ` Lahav Schlesinger
2022-01-04 14:55               ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.