All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 00/16] net: xps: improve the xps maps handling
@ 2021-03-12 15:04 Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 01/16] net-sysfs: convert xps_cpus_show to bitmap_zalloc Antoine Tenart
                   ` (15 more replies)
  0 siblings, 16 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Hello,

This series aims at fixing various issues with the xps code, including
out-of-bound accesses and use-after-free. While doing so we try to
improve the xps code maintainability and readability.

The main change is moving dev->num_tc and dev->nr_ids in the xps maps, to
avoid out-of-bound accesses as those two fields can be updated after the
maps have been allocated. This allows further reworks, to improve the
xps code readability and allow to stop taking the rtnl lock when
reading the maps in sysfs. The maps are moved to an array in net_device,
which simplifies the code a lot.

Note that patch 6 adds a check to assert the rtnl lock is taken when
calling netif_set_xps_queue. Two drivers (mlx5 and virtio_net) are fixed
in this regard, but we might see new warnings in the near future because
of this. This is expected and shouldn't be an issue (it's only a
warning, and the fix should be fairly easy to do).

One future improvement may be to remove the use of xps_map_mutex from
net/core/dev.c, but that may require extra care.

Thanks!
Antoine

Since v2:
  - Patches 13-16 are new to the series.
  - Fixed another issue I found while preparing v3 (use after free of
    old xps maps).
  - Kept the rtnl lock when calling netdev_get_tx_queue and
    netdev_txq_to_tc.
  - Use get_device/put_device when using the sb_dev.
  - Take the rtnl lock in mlx5 and virtio_net when calling
    netif_set_xps_queue.
  - Fixed a coding style issue.

Since v1:
  - Reordered the patches to improve readability and avoid introducing
    issues in between patches.
  - Use dev_maps->nr_ids to allocate the mask in xps_queue_show but
    still default to nr_cpu_ids/dev->num_rx_queues in xps_queue_show
    when dev_maps hasn't been allocated yet for backward compatibility.

Antoine Tenart (16):
  net-sysfs: convert xps_cpus_show to bitmap_zalloc
  net-sysfs: store the return of get_netdev_queue_index in an unsigned
    int
  net-sysfs: make xps_cpus_show and xps_rxqs_show consistent
  net: embed num_tc in the xps maps
  net: embed nr_ids in the xps maps
  net: assert the rtnl lock is held when calling __netif_set_xps_queue
  net: remove the xps possible_mask
  net: move the xps maps to an array
  net: add an helper to copy xps maps to the new dev_maps
  net: improve queue removal readability in __netif_set_xps_queue
  net-sysfs: move the rtnl unlock up in the xps show helpers
  net-sysfs: move the xps cpus/rxqs retrieval in a common function
  net: fix use after free in xps
  net: NULL the old xps map entries when freeing them
  net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  virtio_net: take the rtnl lock when calling virtnet_set_affinity

 .../net/ethernet/mellanox/mlx5/core/en_main.c |  11 +-
 drivers/net/virtio_net.c                      |   8 +-
 include/linux/netdevice.h                     |  27 +-
 net/core/dev.c                                | 250 +++++++++---------
 net/core/net-sysfs.c                          | 177 +++++--------
 5 files changed, 233 insertions(+), 240 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 01/16] net-sysfs: convert xps_cpus_show to bitmap_zalloc
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 02/16] net-sysfs: store the return of get_netdev_queue_index in an unsigned int Antoine Tenart
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Use bitmap_zalloc instead of zalloc_cpumask_var in xps_cpus_show to
align with xps_rxqs_show. This will improve maintenance and allow us to
factorize the two functions. The function should behave the same.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 307628fdf380..3a083c0c9dd3 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1367,8 +1367,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	int cpu, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
-	cpumask_var_t mask;
-	unsigned long index;
+	unsigned long *mask, index;
 
 	if (!netif_is_multiqueue(dev))
 		return -ENOENT;
@@ -1396,7 +1395,8 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 		}
 	}
 
-	if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
+	mask = bitmap_zalloc(nr_cpu_ids, GFP_KERNEL);
+	if (!mask) {
 		ret = -ENOMEM;
 		goto err_rtnl_unlock;
 	}
@@ -1414,7 +1414,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 			for (i = map->len; i--;) {
 				if (map->queues[i] == index) {
-					cpumask_set_cpu(cpu, mask);
+					set_bit(cpu, mask);
 					break;
 				}
 			}
@@ -1424,8 +1424,8 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	rtnl_unlock();
 
-	len = snprintf(buf, PAGE_SIZE, "%*pb\n", cpumask_pr_args(mask));
-	free_cpumask_var(mask);
+	len = bitmap_print_to_pagebuf(false, buf, mask, nr_cpu_ids);
+	bitmap_free(mask);
 	return len < PAGE_SIZE ? len : -EINVAL;
 
 err_rtnl_unlock:
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 02/16] net-sysfs: store the return of get_netdev_queue_index in an unsigned int
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 01/16] net-sysfs: convert xps_cpus_show to bitmap_zalloc Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 03/16] net-sysfs: make xps_cpus_show and xps_rxqs_show consistent Antoine Tenart
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of
its callers use an unsigned long to store the returned value. Update the
code to be consistent, this should only be cosmetic.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 3a083c0c9dd3..5dc4223f6b68 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1367,7 +1367,8 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	int cpu, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
-	unsigned long *mask, index;
+	unsigned long *mask;
+	unsigned int index;
 
 	if (!netif_is_multiqueue(dev))
 		return -ENOENT;
@@ -1437,7 +1438,7 @@ static ssize_t xps_cpus_store(struct netdev_queue *queue,
 			      const char *buf, size_t len)
 {
 	struct net_device *dev = queue->dev;
-	unsigned long index;
+	unsigned int index;
 	cpumask_var_t mask;
 	int err;
 
@@ -1479,7 +1480,8 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 	int j, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
-	unsigned long *mask, index;
+	unsigned long *mask;
+	unsigned int index;
 
 	index = get_netdev_queue_index(queue);
 
@@ -1541,7 +1543,8 @@ static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
 {
 	struct net_device *dev = queue->dev;
 	struct net *net = dev_net(dev);
-	unsigned long *mask, index;
+	unsigned long *mask;
+	unsigned int index;
 	int err;
 
 	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 03/16] net-sysfs: make xps_cpus_show and xps_rxqs_show consistent
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 01/16] net-sysfs: convert xps_cpus_show to bitmap_zalloc Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 02/16] net-sysfs: store the return of get_netdev_queue_index in an unsigned int Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 04/16] net: embed num_tc in the xps maps Antoine Tenart
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Make the implementations of xps_cpus_show and xps_rxqs_show to converge,
as the two share the same logic but diverted over time. This should not
modify their behaviour but will help future changes and improve
maintenance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 5dc4223f6b68..5f76183ad5bc 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1364,7 +1364,7 @@ static const struct attribute_group dql_group = {
 static ssize_t xps_cpus_show(struct netdev_queue *queue,
 			     char *buf)
 {
-	int cpu, len, ret, num_tc = 1, tc = 0;
+	int j, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
 	unsigned long *mask;
@@ -1404,23 +1404,26 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
-	if (dev_maps) {
-		for_each_possible_cpu(cpu) {
-			int i, tci = cpu * num_tc + tc;
-			struct xps_map *map;
-
-			map = rcu_dereference(dev_maps->attr_map[tci]);
-			if (!map)
-				continue;
-
-			for (i = map->len; i--;) {
-				if (map->queues[i] == index) {
-					set_bit(cpu, mask);
-					break;
-				}
+	if (!dev_maps)
+		goto out_no_maps;
+
+	for (j = -1; j = netif_attrmask_next(j, NULL, nr_cpu_ids),
+	     j < nr_cpu_ids;) {
+		int i, tci = j * num_tc + tc;
+		struct xps_map *map;
+
+		map = rcu_dereference(dev_maps->attr_map[tci]);
+		if (!map)
+			continue;
+
+		for (i = map->len; i--;) {
+			if (map->queues[i] == index) {
+				set_bit(j, mask);
+				break;
 			}
 		}
 	}
+out_no_maps:
 	rcu_read_unlock();
 
 	rtnl_unlock();
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 04/16] net: embed num_tc in the xps maps
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (2 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 03/16] net-sysfs: make xps_cpus_show and xps_rxqs_show consistent Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 05/16] net: embed nr_ids " Antoine Tenart
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

The xps cpus/rxqs map is accessed using dev->num_tc, which is used when
allocating the map. But later updates of dev->num_tc can lead to having
a mismatch between the maps and how they're accessed. In such cases the
map values do not make any sense and out of bound accesses can occur
(that can be easily seen using KASAN).

This patch aims at fixing this by embedding num_tc into the maps, using
the value at the time the map is created. This brings two improvements:
- The maps can be accessed using the embedded num_tc, so we know for
  sure we won't have out of bound accesses.
- Checks can be made before accessing the maps so we know the values
  retrieved will make sense.

We also update __netif_set_xps_queue to conditionally copy old maps from
dev_maps in the new one only if the number of traffic classes from both
maps match.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 include/linux/netdevice.h |  6 ++++
 net/core/dev.c            | 63 +++++++++++++++++++++++++--------------
 net/core/net-sysfs.c      | 45 +++++++++++-----------------
 3 files changed, 64 insertions(+), 50 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b379d08a12ed..aa5c45198785 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -771,9 +771,15 @@ struct xps_map {
 
 /*
  * This structure holds all XPS maps for device.  Maps are indexed by CPU.
+ *
+ * We keep track of the number of traffic classes used when the struct is
+ * allocated, in num_tc. This will be used to navigate the maps, to ensure we're
+ * not crossing its upper bound, as the original dev->num_tc can be updated in
+ * the meantime.
  */
 struct xps_dev_maps {
 	struct rcu_head rcu;
+	s16 num_tc;
 	struct xps_map __rcu *attr_map[]; /* Either CPUs map or RXQs map */
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 2bfdd528c7c3..3fa96d54654c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2491,7 +2491,7 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 				 struct xps_dev_maps *dev_maps,
 				 int cpu, u16 offset, u16 count)
 {
-	int num_tc = dev->num_tc ? : 1;
+	int num_tc = dev_maps->num_tc;
 	bool active = false;
 	int tci;
 
@@ -2634,10 +2634,10 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 {
 	const unsigned long *online_mask = NULL, *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
+	bool active = false, copy = false;
 	int i, j, tci, numa_node_id = -2;
 	int maps_sz, num_tc = 1, tc = 0;
 	struct xps_map *map, *new_map;
-	bool active = false;
 	unsigned int nr_ids;
 
 	if (dev->num_tc) {
@@ -2672,19 +2672,29 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (maps_sz < L1_CACHE_BYTES)
 		maps_sz = L1_CACHE_BYTES;
 
+	/* The old dev_maps could be larger or smaller than the one we're
+	 * setting up now, as dev->num_tc could have been updated in between. We
+	 * could try to be smart, but let's be safe instead and only copy
+	 * foreign traffic classes if the two map sizes match.
+	 */
+	if (dev_maps && dev_maps->num_tc == num_tc)
+		copy = true;
+
 	/* allocate memory for queue storage */
 	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
 	     j < nr_ids;) {
-		if (!new_dev_maps)
-			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
 		if (!new_dev_maps) {
-			mutex_unlock(&xps_map_mutex);
-			return -ENOMEM;
+			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
+			if (!new_dev_maps) {
+				mutex_unlock(&xps_map_mutex);
+				return -ENOMEM;
+			}
+
+			new_dev_maps->num_tc = num_tc;
 		}
 
 		tci = j * num_tc + tc;
-		map = dev_maps ? xmap_dereference(dev_maps->attr_map[tci]) :
-				 NULL;
+		map = copy ? xmap_dereference(dev_maps->attr_map[tci]) : NULL;
 
 		map = expand_xps_map(map, j, index, is_rxqs_map);
 		if (!map)
@@ -2706,7 +2716,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
 		/* copy maps belonging to foreign traffic classes */
-		for (i = tc, tci = j * num_tc; dev_maps && i--; tci++) {
+		for (i = tc, tci = j * num_tc; copy && i--; tci++) {
 			/* fill in the new device map from the old device map */
 			map = xmap_dereference(dev_maps->attr_map[tci]);
 			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
@@ -2736,14 +2746,14 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 					numa_node_id = -1;
 			}
 #endif
-		} else if (dev_maps) {
+		} else if (copy) {
 			/* fill in the new device map from the old device map */
 			map = xmap_dereference(dev_maps->attr_map[tci]);
 			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
 		/* copy maps belonging to foreign traffic classes */
-		for (i = num_tc - tc, tci++; dev_maps && --i; tci++) {
+		for (i = num_tc - tc, tci++; copy && --i; tci++) {
 			/* fill in the new device map from the old device map */
 			map = xmap_dereference(dev_maps->attr_map[tci]);
 			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
@@ -2761,11 +2771,18 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 
 	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
-		for (i = num_tc, tci = j * num_tc; i--; tci++) {
-			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
+		for (i = num_tc, tci = j * dev_maps->num_tc; i--; tci++) {
 			map = xmap_dereference(dev_maps->attr_map[tci]);
-			if (map && map != new_map)
-				kfree_rcu(map, rcu);
+			if (!map)
+				continue;
+
+			if (copy) {
+				new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
+				if (map == new_map)
+					continue;
+			}
+
+			kfree_rcu(map, rcu);
 		}
 	}
 
@@ -2789,12 +2806,12 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	/* removes tx-queue from unused CPUs/rx-queues */
 	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
 	     j < nr_ids;) {
-		for (i = tc, tci = j * num_tc; i--; tci++)
+		for (i = tc, tci = j * dev_maps->num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
 		if (!netif_attr_test_mask(j, mask, nr_ids) ||
 		    !netif_attr_test_online(j, online_mask, nr_ids))
 			active |= remove_xps_queue(dev_maps, tci, index);
-		for (i = num_tc - tc, tci++; --i; tci++)
+		for (i = dev_maps->num_tc - tc, tci++; --i; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
 	}
 
@@ -2812,7 +2829,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	     j < nr_ids;) {
 		for (i = num_tc, tci = j * num_tc; i--; tci++) {
 			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
-			map = dev_maps ?
+			map = copy ?
 			      xmap_dereference(dev_maps->attr_map[tci]) :
 			      NULL;
 			if (new_map && new_map != map)
@@ -3944,13 +3961,15 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
 			       struct xps_dev_maps *dev_maps, unsigned int tci)
 {
+	int tc = netdev_get_prio_tc_map(dev, skb->priority);
 	struct xps_map *map;
 	int queue_index = -1;
 
-	if (dev->num_tc) {
-		tci *= dev->num_tc;
-		tci += netdev_get_prio_tc_map(dev, skb->priority);
-	}
+	if (tc >= dev_maps->num_tc)
+		return queue_index;
+
+	tci *= dev_maps->num_tc;
+	tci += tc;
 
 	map = rcu_dereference(dev_maps->attr_map[tci]);
 	if (map) {
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 5f76183ad5bc..1364d0f39cb0 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1364,9 +1364,9 @@ static const struct attribute_group dql_group = {
 static ssize_t xps_cpus_show(struct netdev_queue *queue,
 			     char *buf)
 {
-	int j, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
+	int j, len, ret, tc = 0;
 	unsigned long *mask;
 	unsigned int index;
 
@@ -1378,22 +1378,13 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-	if (dev->num_tc) {
-		/* Do not allow XPS on subordinate device directly */
-		num_tc = dev->num_tc;
-		if (num_tc < 0) {
-			ret = -EINVAL;
-			goto err_rtnl_unlock;
-		}
-
-		/* If queue belongs to subordinate dev use its map */
-		dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
+	/* If queue belongs to subordinate dev use its map */
+	dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
 
-		tc = netdev_txq_to_tc(dev, index);
-		if (tc < 0) {
-			ret = -EINVAL;
-			goto err_rtnl_unlock;
-		}
+	tc = netdev_txq_to_tc(dev, index);
+	if (tc < 0) {
+		ret = -EINVAL;
+		goto err_rtnl_unlock;
 	}
 
 	mask = bitmap_zalloc(nr_cpu_ids, GFP_KERNEL);
@@ -1404,12 +1395,12 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_cpus_map);
-	if (!dev_maps)
+	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
 	for (j = -1; j = netif_attrmask_next(j, NULL, nr_cpu_ids),
 	     j < nr_cpu_ids;) {
-		int i, tci = j * num_tc + tc;
+		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
 		map = rcu_dereference(dev_maps->attr_map[tci]);
@@ -1480,9 +1471,9 @@ static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 
 static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 {
-	int j, len, ret, num_tc = 1, tc = 0;
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
+	int j, len, ret, tc = 0;
 	unsigned long *mask;
 	unsigned int index;
 
@@ -1491,14 +1482,12 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-	if (dev->num_tc) {
-		num_tc = dev->num_tc;
-		tc = netdev_txq_to_tc(dev, index);
-		if (tc < 0) {
-			ret = -EINVAL;
-			goto err_rtnl_unlock;
-		}
+	tc = netdev_txq_to_tc(dev, index);
+	if (tc < 0) {
+		ret = -EINVAL;
+		goto err_rtnl_unlock;
 	}
+
 	mask = bitmap_zalloc(dev->num_rx_queues, GFP_KERNEL);
 	if (!mask) {
 		ret = -ENOMEM;
@@ -1507,12 +1496,12 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_rxqs_map);
-	if (!dev_maps)
+	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
 	for (j = -1; j = netif_attrmask_next(j, NULL, dev->num_rx_queues),
 	     j < dev->num_rx_queues;) {
-		int i, tci = j * num_tc + tc;
+		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
 		map = rcu_dereference(dev_maps->attr_map[tci]);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 05/16] net: embed nr_ids in the xps maps
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (3 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 04/16] net: embed num_tc in the xps maps Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 06/16] net: assert the rtnl lock is held when calling __netif_set_xps_queue Antoine Tenart
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Embed nr_ids (the number of cpu for the xps cpus map, and the number of
rxqs for the xps cpus map) in dev_maps. That will help not accessing out
of bound memory if those values change after dev_maps was allocated.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 include/linux/netdevice.h |  4 ++++
 net/core/dev.c            | 45 ++++++++++++++++++---------------------
 net/core/net-sysfs.c      | 38 +++++++++++++++++++--------------
 3 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index aa5c45198785..5c9e056a0e2d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -772,6 +772,9 @@ struct xps_map {
 /*
  * This structure holds all XPS maps for device.  Maps are indexed by CPU.
  *
+ * We keep track of the number of cpus/rxqs used when the struct is allocated,
+ * in nr_ids. This will help not accessing out-of-bound memory.
+ *
  * We keep track of the number of traffic classes used when the struct is
  * allocated, in num_tc. This will be used to navigate the maps, to ensure we're
  * not crossing its upper bound, as the original dev->num_tc can be updated in
@@ -779,6 +782,7 @@ struct xps_map {
  */
 struct xps_dev_maps {
 	struct rcu_head rcu;
+	unsigned int nr_ids;
 	s16 num_tc;
 	struct xps_map __rcu *attr_map[]; /* Either CPUs map or RXQs map */
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 3fa96d54654c..98a9c620f05a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2524,14 +2524,14 @@ static void reset_xps_maps(struct net_device *dev,
 }
 
 static void clean_xps_maps(struct net_device *dev, const unsigned long *mask,
-			   struct xps_dev_maps *dev_maps, unsigned int nr_ids,
-			   u16 offset, u16 count, bool is_rxqs_map)
+			   struct xps_dev_maps *dev_maps, u16 offset, u16 count,
+			   bool is_rxqs_map)
 {
+	unsigned int nr_ids = dev_maps->nr_ids;
 	bool active = false;
 	int i, j;
 
-	for (j = -1; j = netif_attrmask_next(j, mask, nr_ids),
-	     j < nr_ids;)
+	for (j = -1; j = netif_attrmask_next(j, mask, nr_ids), j < nr_ids;)
 		active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
 					       count);
 	if (!active)
@@ -2551,7 +2551,6 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 {
 	const unsigned long *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps;
-	unsigned int nr_ids;
 
 	if (!static_key_false(&xps_needed))
 		return;
@@ -2561,11 +2560,9 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 
 	if (static_key_false(&xps_rxqs_needed)) {
 		dev_maps = xmap_dereference(dev->xps_rxqs_map);
-		if (dev_maps) {
-			nr_ids = dev->num_rx_queues;
-			clean_xps_maps(dev, possible_mask, dev_maps, nr_ids,
-				       offset, count, true);
-		}
+		if (dev_maps)
+			clean_xps_maps(dev, possible_mask, dev_maps, offset,
+				       count, true);
 	}
 
 	dev_maps = xmap_dereference(dev->xps_cpus_map);
@@ -2574,9 +2571,7 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 
 	if (num_possible_cpus() > 1)
 		possible_mask = cpumask_bits(cpu_possible_mask);
-	nr_ids = nr_cpu_ids;
-	clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset, count,
-		       false);
+	clean_xps_maps(dev, possible_mask, dev_maps, offset, count, false);
 
 out_no_maps:
 	mutex_unlock(&xps_map_mutex);
@@ -2673,11 +2668,12 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		maps_sz = L1_CACHE_BYTES;
 
 	/* The old dev_maps could be larger or smaller than the one we're
-	 * setting up now, as dev->num_tc could have been updated in between. We
-	 * could try to be smart, but let's be safe instead and only copy
-	 * foreign traffic classes if the two map sizes match.
+	 * setting up now, as dev->num_tc or nr_ids could have been updated in
+	 * between. We could try to be smart, but let's be safe instead and only
+	 * copy foreign traffic classes if the two map sizes match.
 	 */
-	if (dev_maps && dev_maps->num_tc == num_tc)
+	if (dev_maps &&
+	    dev_maps->num_tc == num_tc && dev_maps->nr_ids == nr_ids)
 		copy = true;
 
 	/* allocate memory for queue storage */
@@ -2690,6 +2686,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 				return -ENOMEM;
 			}
 
+			new_dev_maps->nr_ids = nr_ids;
 			new_dev_maps->num_tc = num_tc;
 		}
 
@@ -2770,7 +2767,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		goto out_no_old_maps;
 
 	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
-	     j < nr_ids;) {
+	     j < dev_maps->nr_ids;) {
 		for (i = num_tc, tci = j * dev_maps->num_tc; i--; tci++) {
 			map = xmap_dereference(dev_maps->attr_map[tci]);
 			if (!map)
@@ -2804,12 +2801,12 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		goto out_no_maps;
 
 	/* removes tx-queue from unused CPUs/rx-queues */
-	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
-	     j < nr_ids;) {
+	for (j = -1; j = netif_attrmask_next(j, possible_mask, dev_maps->nr_ids),
+	     j < dev_maps->nr_ids;) {
 		for (i = tc, tci = j * dev_maps->num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
-		if (!netif_attr_test_mask(j, mask, nr_ids) ||
-		    !netif_attr_test_online(j, online_mask, nr_ids))
+		if (!netif_attr_test_mask(j, mask, dev_maps->nr_ids) ||
+		    !netif_attr_test_online(j, online_mask, dev_maps->nr_ids))
 			active |= remove_xps_queue(dev_maps, tci, index);
 		for (i = dev_maps->num_tc - tc, tci++; --i; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
@@ -3965,7 +3962,7 @@ static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
 	struct xps_map *map;
 	int queue_index = -1;
 
-	if (tc >= dev_maps->num_tc)
+	if (tc >= dev_maps->num_tc || tci >= dev_maps->nr_ids)
 		return queue_index;
 
 	tci *= dev_maps->num_tc;
@@ -4004,7 +4001,7 @@ static int get_xps_queue(struct net_device *dev, struct net_device *sb_dev,
 	if (dev_maps) {
 		int tci = sk_rx_queue_get(sk);
 
-		if (tci >= 0 && tci < dev->num_rx_queues)
+		if (tci >= 0)
 			queue_index = __get_xps_queue_idx(dev, skb, dev_maps,
 							  tci);
 	}
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1364d0f39cb0..bb08bdc88fa9 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1366,9 +1366,9 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 {
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
+	unsigned int index, nr_ids;
 	int j, len, ret, tc = 0;
 	unsigned long *mask;
-	unsigned int index;
 
 	if (!netif_is_multiqueue(dev))
 		return -ENOENT;
@@ -1387,19 +1387,20 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 		goto err_rtnl_unlock;
 	}
 
-	mask = bitmap_zalloc(nr_cpu_ids, GFP_KERNEL);
+	rcu_read_lock();
+	dev_maps = rcu_dereference(dev->xps_cpus_map);
+	nr_ids = dev_maps ? dev_maps->nr_ids : nr_cpu_ids;
+
+	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
 	if (!mask) {
 		ret = -ENOMEM;
-		goto err_rtnl_unlock;
+		goto err_rcu_unlock;
 	}
 
-	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_cpus_map);
 	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
-	for (j = -1; j = netif_attrmask_next(j, NULL, nr_cpu_ids),
-	     j < nr_cpu_ids;) {
+	for (j = -1; j = netif_attrmask_next(j, NULL, nr_ids), j < nr_ids;) {
 		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
@@ -1419,10 +1420,12 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	rtnl_unlock();
 
-	len = bitmap_print_to_pagebuf(false, buf, mask, nr_cpu_ids);
+	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
 	bitmap_free(mask);
 	return len < PAGE_SIZE ? len : -EINVAL;
 
+err_rcu_unlock:
+	rcu_read_unlock();
 err_rtnl_unlock:
 	rtnl_unlock();
 	return ret;
@@ -1473,9 +1476,9 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 {
 	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
+	unsigned int index, nr_ids;
 	int j, len, ret, tc = 0;
 	unsigned long *mask;
-	unsigned int index;
 
 	index = get_netdev_queue_index(queue);
 
@@ -1488,19 +1491,20 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 		goto err_rtnl_unlock;
 	}
 
-	mask = bitmap_zalloc(dev->num_rx_queues, GFP_KERNEL);
+	rcu_read_lock();
+	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	nr_ids = dev_maps ? dev_maps->nr_ids : dev->num_rx_queues;
+
+	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
 	if (!mask) {
 		ret = -ENOMEM;
-		goto err_rtnl_unlock;
+		goto err_rcu_unlock;
 	}
 
-	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_rxqs_map);
 	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
-	for (j = -1; j = netif_attrmask_next(j, NULL, dev->num_rx_queues),
-	     j < dev->num_rx_queues;) {
+	for (j = -1; j = netif_attrmask_next(j, NULL, nr_ids), j < nr_ids;) {
 		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
@@ -1520,11 +1524,13 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 
 	rtnl_unlock();
 
-	len = bitmap_print_to_pagebuf(false, buf, mask, dev->num_rx_queues);
+	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
 	bitmap_free(mask);
 
 	return len < PAGE_SIZE ? len : -EINVAL;
 
+err_rcu_unlock:
+	rcu_read_unlock();
 err_rtnl_unlock:
 	rtnl_unlock();
 	return ret;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 06/16] net: assert the rtnl lock is held when calling __netif_set_xps_queue
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (4 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 05/16] net: embed nr_ids " Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 07/16] net: remove the xps possible_mask Antoine Tenart
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Add ASSERT_RTNL at the top of __netif_set_xps_queue and add a comment
about holding the rtnl lock before the function.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 98a9c620f05a..24d8f059e2a6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2623,7 +2623,7 @@ static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
 	return new_map;
 }
 
-/* Must be called under cpus_read_lock */
+/* Must be called under rtnl_lock and cpus_read_lock */
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			  u16 index, bool is_rxqs_map)
 {
@@ -2635,6 +2635,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	struct xps_map *map, *new_map;
 	unsigned int nr_ids;
 
+	ASSERT_RTNL();
+
 	if (dev->num_tc) {
 		/* Do not allow XPS on subordinate device directly */
 		num_tc = dev->num_tc;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 07/16] net: remove the xps possible_mask
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (5 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 06/16] net: assert the rtnl lock is held when calling __netif_set_xps_queue Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 08/16] net: move the xps maps to an array Antoine Tenart
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Remove the xps possible_mask. It was an optimization but we can just
loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
simplifies the code a bit.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c       | 40 +++++++++++++---------------------------
 net/core/net-sysfs.c |  4 ++--
 2 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 24d8f059e2a6..241f440b306a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2523,33 +2523,28 @@ static void reset_xps_maps(struct net_device *dev,
 	kfree_rcu(dev_maps, rcu);
 }
 
-static void clean_xps_maps(struct net_device *dev, const unsigned long *mask,
+static void clean_xps_maps(struct net_device *dev,
 			   struct xps_dev_maps *dev_maps, u16 offset, u16 count,
 			   bool is_rxqs_map)
 {
-	unsigned int nr_ids = dev_maps->nr_ids;
 	bool active = false;
 	int i, j;
 
-	for (j = -1; j = netif_attrmask_next(j, mask, nr_ids), j < nr_ids;)
-		active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
-					       count);
+	for (j = 0; j < dev_maps->nr_ids; j++)
+		active |= remove_xps_queue_cpu(dev, dev_maps, j, offset, count);
 	if (!active)
 		reset_xps_maps(dev, dev_maps, is_rxqs_map);
 
 	if (!is_rxqs_map) {
-		for (i = offset + (count - 1); count--; i--) {
+		for (i = offset + (count - 1); count--; i--)
 			netdev_queue_numa_node_write(
-				netdev_get_tx_queue(dev, i),
-				NUMA_NO_NODE);
-		}
+				netdev_get_tx_queue(dev, i), NUMA_NO_NODE);
 	}
 }
 
 static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 				   u16 count)
 {
-	const unsigned long *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps;
 
 	if (!static_key_false(&xps_needed))
@@ -2561,17 +2556,14 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 	if (static_key_false(&xps_rxqs_needed)) {
 		dev_maps = xmap_dereference(dev->xps_rxqs_map);
 		if (dev_maps)
-			clean_xps_maps(dev, possible_mask, dev_maps, offset,
-				       count, true);
+			clean_xps_maps(dev, dev_maps, offset, count, true);
 	}
 
 	dev_maps = xmap_dereference(dev->xps_cpus_map);
 	if (!dev_maps)
 		goto out_no_maps;
 
-	if (num_possible_cpus() > 1)
-		possible_mask = cpumask_bits(cpu_possible_mask);
-	clean_xps_maps(dev, possible_mask, dev_maps, offset, count, false);
+	clean_xps_maps(dev, dev_maps, offset, count, false);
 
 out_no_maps:
 	mutex_unlock(&xps_map_mutex);
@@ -2627,8 +2619,8 @@ static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			  u16 index, bool is_rxqs_map)
 {
-	const unsigned long *online_mask = NULL, *possible_mask = NULL;
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
+	const unsigned long *online_mask = NULL;
 	bool active = false, copy = false;
 	int i, j, tci, numa_node_id = -2;
 	int maps_sz, num_tc = 1, tc = 0;
@@ -2658,10 +2650,8 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		nr_ids = dev->num_rx_queues;
 	} else {
 		maps_sz = XPS_CPU_DEV_MAPS_SIZE(num_tc);
-		if (num_possible_cpus() > 1) {
+		if (num_possible_cpus() > 1)
 			online_mask = cpumask_bits(cpu_online_mask);
-			possible_mask = cpumask_bits(cpu_possible_mask);
-		}
 		dev_maps = xmap_dereference(dev->xps_cpus_map);
 		nr_ids = nr_cpu_ids;
 	}
@@ -2712,8 +2702,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			static_key_slow_inc_cpuslocked(&xps_rxqs_needed);
 	}
 
-	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
-	     j < nr_ids;) {
+	for (j = 0; j < nr_ids; j++) {
 		/* copy maps belonging to foreign traffic classes */
 		for (i = tc, tci = j * num_tc; copy && i--; tci++) {
 			/* fill in the new device map from the old device map */
@@ -2768,8 +2757,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (!dev_maps)
 		goto out_no_old_maps;
 
-	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
-	     j < dev_maps->nr_ids;) {
+	for (j = 0; j < dev_maps->nr_ids; j++) {
 		for (i = num_tc, tci = j * dev_maps->num_tc; i--; tci++) {
 			map = xmap_dereference(dev_maps->attr_map[tci]);
 			if (!map)
@@ -2803,8 +2791,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		goto out_no_maps;
 
 	/* removes tx-queue from unused CPUs/rx-queues */
-	for (j = -1; j = netif_attrmask_next(j, possible_mask, dev_maps->nr_ids),
-	     j < dev_maps->nr_ids;) {
+	for (j = 0; j < dev_maps->nr_ids; j++) {
 		for (i = tc, tci = j * dev_maps->num_tc; i--; tci++)
 			active |= remove_xps_queue(dev_maps, tci, index);
 		if (!netif_attr_test_mask(j, mask, dev_maps->nr_ids) ||
@@ -2824,8 +2811,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	return 0;
 error:
 	/* remove any maps that we added */
-	for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
-	     j < nr_ids;) {
+	for (j = 0; j < nr_ids; j++) {
 		for (i = num_tc, tci = j * num_tc; i--; tci++) {
 			new_map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			map = copy ?
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bb08bdc88fa9..c762c435ff76 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1400,7 +1400,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
-	for (j = -1; j = netif_attrmask_next(j, NULL, nr_ids), j < nr_ids;) {
+	for (j = 0; j < nr_ids; j++) {
 		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
@@ -1504,7 +1504,7 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 	if (!dev_maps || tc >= dev_maps->num_tc)
 		goto out_no_maps;
 
-	for (j = -1; j = netif_attrmask_next(j, NULL, nr_ids), j < nr_ids;) {
+	for (j = 0; j < nr_ids; j++) {
 		int i, tci = j * dev_maps->num_tc + tc;
 		struct xps_map *map;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 08/16] net: move the xps maps to an array
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (6 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 07/16] net: remove the xps possible_mask Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 09/16] net: add an helper to copy xps maps to the new dev_maps Antoine Tenart
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.

This should not modify the xps maps behaviour in any way.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 drivers/net/virtio_net.c  |  2 +-
 include/linux/netdevice.h | 17 +++++----
 net/core/dev.c            | 73 +++++++++++++++++----------------------
 net/core/net-sysfs.c      |  6 ++--
 4 files changed, 46 insertions(+), 52 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e97288dd6e5a..dde9bbcc5ff0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2015,7 +2015,7 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
 		}
 		virtqueue_set_affinity(vi->rq[i].vq, mask);
 		virtqueue_set_affinity(vi->sq[i].vq, mask);
-		__netif_set_xps_queue(vi->dev, cpumask_bits(mask), i, false);
+		__netif_set_xps_queue(vi->dev, cpumask_bits(mask), i, XPS_CPUS);
 		cpumask_clear(mask);
 	}
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5c9e056a0e2d..bcd15a2d3ddc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -754,6 +754,13 @@ struct rx_queue_attribute {
 			 const char *buf, size_t len);
 };
 
+/* XPS map type and offset of the xps map within net_device->xps_maps[]. */
+enum xps_map_type {
+	XPS_CPUS = 0,
+	XPS_RXQS,
+	XPS_MAPS_MAX,
+};
+
 #ifdef CONFIG_XPS
 /*
  * This structure holds an XPS map which can be of variable length.  The
@@ -1773,8 +1780,7 @@ enum netdev_ml_priv_type {
  *	@tx_queue_len:		Max frames per queue allowed
  *	@tx_global_lock: 	XXX: need comments on this one
  *	@xdp_bulkq:		XDP device bulk queue
- *	@xps_cpus_map:		all CPUs map for XPS device
- *	@xps_rxqs_map:		all RXQs map for XPS device
+ *	@xps_maps:		all CPUs/RXQs maps for XPS device
  *
  *	@xps_maps:	XXX: need comments on this one
  *	@miniq_egress:		clsact qdisc specific data for
@@ -2070,8 +2076,7 @@ struct net_device {
 	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
 
 #ifdef CONFIG_XPS
-	struct xps_dev_maps __rcu *xps_cpus_map;
-	struct xps_dev_maps __rcu *xps_rxqs_map;
+	struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc __rcu	*miniq_egress;
@@ -3701,7 +3706,7 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 			u16 index);
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
-			  u16 index, bool is_rxqs_map);
+			  u16 index, enum xps_map_type type);
 
 /**
  *	netif_attr_test_mask - Test a CPU or Rx queue set in a mask
@@ -3796,7 +3801,7 @@ static inline int netif_set_xps_queue(struct net_device *dev,
 
 static inline int __netif_set_xps_queue(struct net_device *dev,
 					const unsigned long *mask,
-					u16 index, bool is_rxqs_map)
+					u16 index, enum xps_map_type type)
 {
 	return 0;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 241f440b306a..dfdd476a6d67 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2511,31 +2511,34 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 
 static void reset_xps_maps(struct net_device *dev,
 			   struct xps_dev_maps *dev_maps,
-			   bool is_rxqs_map)
+			   enum xps_map_type type)
 {
-	if (is_rxqs_map) {
-		static_key_slow_dec_cpuslocked(&xps_rxqs_needed);
-		RCU_INIT_POINTER(dev->xps_rxqs_map, NULL);
-	} else {
-		RCU_INIT_POINTER(dev->xps_cpus_map, NULL);
-	}
 	static_key_slow_dec_cpuslocked(&xps_needed);
+	if (type == XPS_RXQS)
+		static_key_slow_dec_cpuslocked(&xps_rxqs_needed);
+
+	RCU_INIT_POINTER(dev->xps_maps[type], NULL);
+
 	kfree_rcu(dev_maps, rcu);
 }
 
-static void clean_xps_maps(struct net_device *dev,
-			   struct xps_dev_maps *dev_maps, u16 offset, u16 count,
-			   bool is_rxqs_map)
+static void clean_xps_maps(struct net_device *dev, enum xps_map_type type,
+			   u16 offset, u16 count)
 {
+	struct xps_dev_maps *dev_maps;
 	bool active = false;
 	int i, j;
 
+	dev_maps = xmap_dereference(dev->xps_maps[type]);
+	if (!dev_maps)
+		return;
+
 	for (j = 0; j < dev_maps->nr_ids; j++)
 		active |= remove_xps_queue_cpu(dev, dev_maps, j, offset, count);
 	if (!active)
-		reset_xps_maps(dev, dev_maps, is_rxqs_map);
+		reset_xps_maps(dev, dev_maps, type);
 
-	if (!is_rxqs_map) {
+	if (type == XPS_CPUS) {
 		for (i = offset + (count - 1); count--; i--)
 			netdev_queue_numa_node_write(
 				netdev_get_tx_queue(dev, i), NUMA_NO_NODE);
@@ -2545,27 +2548,17 @@ static void clean_xps_maps(struct net_device *dev,
 static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 				   u16 count)
 {
-	struct xps_dev_maps *dev_maps;
-
 	if (!static_key_false(&xps_needed))
 		return;
 
 	cpus_read_lock();
 	mutex_lock(&xps_map_mutex);
 
-	if (static_key_false(&xps_rxqs_needed)) {
-		dev_maps = xmap_dereference(dev->xps_rxqs_map);
-		if (dev_maps)
-			clean_xps_maps(dev, dev_maps, offset, count, true);
-	}
-
-	dev_maps = xmap_dereference(dev->xps_cpus_map);
-	if (!dev_maps)
-		goto out_no_maps;
+	if (static_key_false(&xps_rxqs_needed))
+		clean_xps_maps(dev, XPS_RXQS, offset, count);
 
-	clean_xps_maps(dev, dev_maps, offset, count, false);
+	clean_xps_maps(dev, XPS_CPUS, offset, count);
 
-out_no_maps:
 	mutex_unlock(&xps_map_mutex);
 	cpus_read_unlock();
 }
@@ -2617,7 +2610,7 @@ static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
 
 /* Must be called under rtnl_lock and cpus_read_lock */
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
-			  u16 index, bool is_rxqs_map)
+			  u16 index, enum xps_map_type type)
 {
 	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
 	const unsigned long *online_mask = NULL;
@@ -2644,15 +2637,15 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	}
 
 	mutex_lock(&xps_map_mutex);
-	if (is_rxqs_map) {
+
+	dev_maps = xmap_dereference(dev->xps_maps[type]);
+	if (type == XPS_RXQS) {
 		maps_sz = XPS_RXQ_DEV_MAPS_SIZE(num_tc, dev->num_rx_queues);
-		dev_maps = xmap_dereference(dev->xps_rxqs_map);
 		nr_ids = dev->num_rx_queues;
 	} else {
 		maps_sz = XPS_CPU_DEV_MAPS_SIZE(num_tc);
 		if (num_possible_cpus() > 1)
 			online_mask = cpumask_bits(cpu_online_mask);
-		dev_maps = xmap_dereference(dev->xps_cpus_map);
 		nr_ids = nr_cpu_ids;
 	}
 
@@ -2685,7 +2678,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		tci = j * num_tc + tc;
 		map = copy ? xmap_dereference(dev_maps->attr_map[tci]) : NULL;
 
-		map = expand_xps_map(map, j, index, is_rxqs_map);
+		map = expand_xps_map(map, j, index, type == XPS_RXQS);
 		if (!map)
 			goto error;
 
@@ -2698,7 +2691,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	if (!dev_maps) {
 		/* Increment static keys at most once per type */
 		static_key_slow_inc_cpuslocked(&xps_needed);
-		if (is_rxqs_map)
+		if (type == XPS_RXQS)
 			static_key_slow_inc_cpuslocked(&xps_rxqs_needed);
 	}
 
@@ -2727,7 +2720,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			if (pos == map->len)
 				map->queues[map->len++] = index;
 #ifdef CONFIG_NUMA
-			if (!is_rxqs_map) {
+			if (type == XPS_CPUS) {
 				if (numa_node_id == -2)
 					numa_node_id = cpu_to_node(j);
 				else if (numa_node_id != cpu_to_node(j))
@@ -2748,10 +2741,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		}
 	}
 
-	if (is_rxqs_map)
-		rcu_assign_pointer(dev->xps_rxqs_map, new_dev_maps);
-	else
-		rcu_assign_pointer(dev->xps_cpus_map, new_dev_maps);
+	rcu_assign_pointer(dev->xps_maps[type], new_dev_maps);
 
 	/* Cleanup old maps */
 	if (!dev_maps)
@@ -2780,12 +2770,11 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	active = true;
 
 out_no_new_maps:
-	if (!is_rxqs_map) {
+	if (type == XPS_CPUS)
 		/* update Tx queue numa node */
 		netdev_queue_numa_node_write(netdev_get_tx_queue(dev, index),
 					     (numa_node_id >= 0) ?
 					     numa_node_id : NUMA_NO_NODE);
-	}
 
 	if (!dev_maps)
 		goto out_no_maps;
@@ -2803,7 +2792,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 
 	/* free map if not active */
 	if (!active)
-		reset_xps_maps(dev, dev_maps, is_rxqs_map);
+		reset_xps_maps(dev, dev_maps, type);
 
 out_no_maps:
 	mutex_unlock(&xps_map_mutex);
@@ -2835,7 +2824,7 @@ int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
 	int ret;
 
 	cpus_read_lock();
-	ret =  __netif_set_xps_queue(dev, cpumask_bits(mask), index, false);
+	ret =  __netif_set_xps_queue(dev, cpumask_bits(mask), index, XPS_CPUS);
 	cpus_read_unlock();
 
 	return ret;
@@ -3985,7 +3974,7 @@ static int get_xps_queue(struct net_device *dev, struct net_device *sb_dev,
 	if (!static_key_false(&xps_rxqs_needed))
 		goto get_cpus_map;
 
-	dev_maps = rcu_dereference(sb_dev->xps_rxqs_map);
+	dev_maps = rcu_dereference(sb_dev->xps_maps[XPS_RXQS]);
 	if (dev_maps) {
 		int tci = sk_rx_queue_get(sk);
 
@@ -3996,7 +3985,7 @@ static int get_xps_queue(struct net_device *dev, struct net_device *sb_dev,
 
 get_cpus_map:
 	if (queue_index < 0) {
-		dev_maps = rcu_dereference(sb_dev->xps_cpus_map);
+		dev_maps = rcu_dereference(sb_dev->xps_maps[XPS_CPUS]);
 		if (dev_maps) {
 			unsigned int tci = skb->sender_cpu - 1;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index c762c435ff76..ca1f3b63cfad 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1388,7 +1388,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	}
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_cpus_map);
+	dev_maps = rcu_dereference(dev->xps_maps[XPS_CPUS]);
 	nr_ids = dev_maps ? dev_maps->nr_ids : nr_cpu_ids;
 
 	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
@@ -1492,7 +1492,7 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 	}
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_rxqs_map);
+	dev_maps = rcu_dereference(dev->xps_maps[XPS_RXQS]);
 	nr_ids = dev_maps ? dev_maps->nr_ids : dev->num_rx_queues;
 
 	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
@@ -1566,7 +1566,7 @@ static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
 	}
 
 	cpus_read_lock();
-	err = __netif_set_xps_queue(dev, mask, index, true);
+	err = __netif_set_xps_queue(dev, mask, index, XPS_RXQS);
 	cpus_read_unlock();
 
 	rtnl_unlock();
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 09/16] net: add an helper to copy xps maps to the new dev_maps
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (7 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 08/16] net: move the xps maps to an array Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 10/16] net: improve queue removal readability in __netif_set_xps_queue Antoine Tenart
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps
to new_dev_maps at a given index. The logic should be the same, with an
improved code readability and maintenance.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index dfdd476a6d67..4d39938417c4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2608,6 +2608,25 @@ static struct xps_map *expand_xps_map(struct xps_map *map, int attr_index,
 	return new_map;
 }
 
+/* Copy xps maps at a given index */
+static void xps_copy_dev_maps(struct xps_dev_maps *dev_maps,
+			      struct xps_dev_maps *new_dev_maps, int index,
+			      int tc, bool skip_tc)
+{
+	int i, tci = index * dev_maps->num_tc;
+	struct xps_map *map;
+
+	/* copy maps belonging to foreign traffic classes */
+	for (i = 0; i < dev_maps->num_tc; i++, tci++) {
+		if (i == tc && skip_tc)
+			continue;
+
+		/* fill in the new device map from the old device map */
+		map = xmap_dereference(dev_maps->attr_map[tci]);
+		RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
+	}
+}
+
 /* Must be called under rtnl_lock and cpus_read_lock */
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			  u16 index, enum xps_map_type type)
@@ -2696,23 +2715,16 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 	}
 
 	for (j = 0; j < nr_ids; j++) {
-		/* copy maps belonging to foreign traffic classes */
-		for (i = tc, tci = j * num_tc; copy && i--; tci++) {
-			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->attr_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
-		}
+		bool skip_tc = false;
 
-		/* We need to explicitly update tci as prevous loop
-		 * could break out early if dev_maps is NULL.
-		 */
 		tci = j * num_tc + tc;
-
 		if (netif_attr_test_mask(j, mask, nr_ids) &&
 		    netif_attr_test_online(j, online_mask, nr_ids)) {
 			/* add tx-queue to CPU/rx-queue maps */
 			int pos = 0;
 
+			skip_tc = true;
+
 			map = xmap_dereference(new_dev_maps->attr_map[tci]);
 			while ((pos < map->len) && (map->queues[pos] != index))
 				pos++;
@@ -2727,18 +2739,11 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 					numa_node_id = -1;
 			}
 #endif
-		} else if (copy) {
-			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->attr_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
 		}
 
-		/* copy maps belonging to foreign traffic classes */
-		for (i = num_tc - tc, tci++; copy && --i; tci++) {
-			/* fill in the new device map from the old device map */
-			map = xmap_dereference(dev_maps->attr_map[tci]);
-			RCU_INIT_POINTER(new_dev_maps->attr_map[tci], map);
-		}
+		if (copy)
+			xps_copy_dev_maps(dev_maps, new_dev_maps, j, tc,
+					  skip_tc);
 	}
 
 	rcu_assign_pointer(dev->xps_maps[type], new_dev_maps);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 10/16] net: improve queue removal readability in __netif_set_xps_queue
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (8 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 09/16] net: add an helper to copy xps maps to the new dev_maps Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 11/16] net-sysfs: move the rtnl unlock up in the xps show helpers Antoine Tenart
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Improve the readability of the loop removing tx-queue from unused
CPUs/rx-queues in __netif_set_xps_queue. The change should only be
cosmetic.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4d39938417c4..052797ca65f6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2786,13 +2786,16 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 
 	/* removes tx-queue from unused CPUs/rx-queues */
 	for (j = 0; j < dev_maps->nr_ids; j++) {
-		for (i = tc, tci = j * dev_maps->num_tc; i--; tci++)
-			active |= remove_xps_queue(dev_maps, tci, index);
-		if (!netif_attr_test_mask(j, mask, dev_maps->nr_ids) ||
-		    !netif_attr_test_online(j, online_mask, dev_maps->nr_ids))
-			active |= remove_xps_queue(dev_maps, tci, index);
-		for (i = dev_maps->num_tc - tc, tci++; --i; tci++)
+		tci = j * dev_maps->num_tc;
+
+		for (i = 0; i < dev_maps->num_tc; i++, tci++) {
+			if (i == tc &&
+			    netif_attr_test_mask(j, mask, dev_maps->nr_ids) &&
+			    netif_attr_test_online(j, online_mask, dev_maps->nr_ids))
+				continue;
+
 			active |= remove_xps_queue(dev_maps, tci, index);
+		}
 	}
 
 	/* free map if not active */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 11/16] net-sysfs: move the rtnl unlock up in the xps show helpers
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (9 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 10/16] net: improve queue removal readability in __netif_set_xps_queue Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 12/16] net-sysfs: move the xps cpus/rxqs retrieval in a common function Antoine Tenart
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
protected, we do not have the need to protect the maps in the rtnl lock.
Move the rtnl unlock up so we reduce the rtnl locking section.

We also increase the reference count on the subordinate device if any,
as we don't want this device to be freed while we use it (now that the
rtnl lock isn't protecting it in the whole function).

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index ca1f3b63cfad..094fea082649 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1383,10 +1383,14 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 	tc = netdev_txq_to_tc(dev, index);
 	if (tc < 0) {
-		ret = -EINVAL;
-		goto err_rtnl_unlock;
+		rtnl_unlock();
+		return -EINVAL;
 	}
 
+	/* Make sure the subordinate device can't be freed */
+	get_device(&dev->dev);
+	rtnl_unlock();
+
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_maps[XPS_CPUS]);
 	nr_ids = dev_maps ? dev_maps->nr_ids : nr_cpu_ids;
@@ -1417,8 +1421,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	}
 out_no_maps:
 	rcu_read_unlock();
-
-	rtnl_unlock();
+	put_device(&dev->dev);
 
 	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
 	bitmap_free(mask);
@@ -1426,8 +1429,7 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 
 err_rcu_unlock:
 	rcu_read_unlock();
-err_rtnl_unlock:
-	rtnl_unlock();
+	put_device(&dev->dev);
 	return ret;
 }
 
@@ -1486,10 +1488,9 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 		return restart_syscall();
 
 	tc = netdev_txq_to_tc(dev, index);
-	if (tc < 0) {
-		ret = -EINVAL;
-		goto err_rtnl_unlock;
-	}
+	rtnl_unlock();
+	if (tc < 0)
+		return -EINVAL;
 
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_maps[XPS_RXQS]);
@@ -1522,8 +1523,6 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 out_no_maps:
 	rcu_read_unlock();
 
-	rtnl_unlock();
-
 	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
 	bitmap_free(mask);
 
@@ -1531,8 +1530,6 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 
 err_rcu_unlock:
 	rcu_read_unlock();
-err_rtnl_unlock:
-	rtnl_unlock();
 	return ret;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 12/16] net-sysfs: move the xps cpus/rxqs retrieval in a common function
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (10 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 11/16] net-sysfs: move the rtnl unlock up in the xps show helpers Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 13/16] net: fix use after free in xps Antoine Tenart
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

Most of the xps_cpus_show and xps_rxqs_show functions share the same
logic. Having it in two different functions does not help maintenance.
This patch moves their common logic into a new function, xps_queue_show,
to improve this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/net-sysfs.c | 125 +++++++++++++++++--------------------------
 1 file changed, 48 insertions(+), 77 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 094fea082649..562a42fcd437 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1361,44 +1361,27 @@ static const struct attribute_group dql_group = {
 #endif /* CONFIG_BQL */
 
 #ifdef CONFIG_XPS
-static ssize_t xps_cpus_show(struct netdev_queue *queue,
-			     char *buf)
+static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
+			      int tc, char *buf, enum xps_map_type type)
 {
-	struct net_device *dev = queue->dev;
 	struct xps_dev_maps *dev_maps;
-	unsigned int index, nr_ids;
-	int j, len, ret, tc = 0;
 	unsigned long *mask;
-
-	if (!netif_is_multiqueue(dev))
-		return -ENOENT;
-
-	index = get_netdev_queue_index(queue);
-
-	if (!rtnl_trylock())
-		return restart_syscall();
-
-	/* If queue belongs to subordinate dev use its map */
-	dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
-
-	tc = netdev_txq_to_tc(dev, index);
-	if (tc < 0) {
-		rtnl_unlock();
-		return -EINVAL;
-	}
-
-	/* Make sure the subordinate device can't be freed */
-	get_device(&dev->dev);
-	rtnl_unlock();
+	unsigned int nr_ids;
+	int j, len;
 
 	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps[XPS_CPUS]);
-	nr_ids = dev_maps ? dev_maps->nr_ids : nr_cpu_ids;
+	dev_maps = rcu_dereference(dev->xps_maps[type]);
+
+	/* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0
+	 * when dev_maps hasn't been allocated yet, to be backward compatible.
+	 */
+	nr_ids = dev_maps ? dev_maps->nr_ids :
+		 (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
 
 	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
 	if (!mask) {
-		ret = -ENOMEM;
-		goto err_rcu_unlock;
+		rcu_read_unlock();
+		return -ENOMEM;
 	}
 
 	if (!dev_maps || tc >= dev_maps->num_tc)
@@ -1421,16 +1404,44 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	}
 out_no_maps:
 	rcu_read_unlock();
-	put_device(&dev->dev);
 
 	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
 	bitmap_free(mask);
+
 	return len < PAGE_SIZE ? len : -EINVAL;
+}
+
+static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
+{
+	struct net_device *dev = queue->dev;
+	unsigned int index;
+	int len, tc;
+
+	if (!netif_is_multiqueue(dev))
+		return -ENOENT;
+
+	index = get_netdev_queue_index(queue);
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	/* If queue belongs to subordinate dev use its map */
+	dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
+
+	tc = netdev_txq_to_tc(dev, index);
+	if (tc < 0) {
+		rtnl_unlock();
+		return -EINVAL;
+	}
+
+	/* Make sure the subordinate device can't be freed */
+	get_device(&dev->dev);
+	rtnl_unlock();
+
+	len = xps_queue_show(dev, index, tc, buf, XPS_CPUS);
 
-err_rcu_unlock:
-	rcu_read_unlock();
 	put_device(&dev->dev);
-	return ret;
+	return len;
 }
 
 static ssize_t xps_cpus_store(struct netdev_queue *queue,
@@ -1477,10 +1488,8 @@ static struct netdev_queue_attribute xps_cpus_attribute __ro_after_init
 static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 {
 	struct net_device *dev = queue->dev;
-	struct xps_dev_maps *dev_maps;
-	unsigned int index, nr_ids;
-	int j, len, ret, tc = 0;
-	unsigned long *mask;
+	unsigned int index;
+	int tc;
 
 	index = get_netdev_queue_index(queue);
 
@@ -1492,45 +1501,7 @@ static ssize_t xps_rxqs_show(struct netdev_queue *queue, char *buf)
 	if (tc < 0)
 		return -EINVAL;
 
-	rcu_read_lock();
-	dev_maps = rcu_dereference(dev->xps_maps[XPS_RXQS]);
-	nr_ids = dev_maps ? dev_maps->nr_ids : dev->num_rx_queues;
-
-	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
-	if (!mask) {
-		ret = -ENOMEM;
-		goto err_rcu_unlock;
-	}
-
-	if (!dev_maps || tc >= dev_maps->num_tc)
-		goto out_no_maps;
-
-	for (j = 0; j < nr_ids; j++) {
-		int i, tci = j * dev_maps->num_tc + tc;
-		struct xps_map *map;
-
-		map = rcu_dereference(dev_maps->attr_map[tci]);
-		if (!map)
-			continue;
-
-		for (i = map->len; i--;) {
-			if (map->queues[i] == index) {
-				set_bit(j, mask);
-				break;
-			}
-		}
-	}
-out_no_maps:
-	rcu_read_unlock();
-
-	len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
-	bitmap_free(mask);
-
-	return len < PAGE_SIZE ? len : -EINVAL;
-
-err_rcu_unlock:
-	rcu_read_unlock();
-	return ret;
+	return xps_queue_show(dev, index, tc, buf, XPS_RXQS);
 }
 
 static ssize_t xps_rxqs_store(struct netdev_queue *queue, const char *buf,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 13/16] net: fix use after free in xps
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (11 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 12/16] net-sysfs: move the xps cpus/rxqs retrieval in a common function Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 14/16] net: NULL the old xps map entries when freeing them Antoine Tenart
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

When setting up an new dev_maps in __netif_set_xps_queue, we remove and
free maps from unused CPUs/rx-queues near the end of the function; by
calling remove_xps_queue. However it's possible those maps are also part
of the old not-freed-yet dev_maps, which might be used concurrently.
When that happens, a map can be freed while its corresponding entry in
the old dev_maps table isn't NULLed, leading to:

  BUG: KASAN: use-after-free in xps_queue_show+0x469/0x480

This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL
the map entries from the old dev_maps table.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 052797ca65f6..748e377c7fe3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2460,7 +2460,7 @@ static DEFINE_MUTEX(xps_map_mutex);
 	rcu_dereference_protected((P), lockdep_is_held(&xps_map_mutex))
 
 static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
-			     int tci, u16 index)
+			     struct xps_dev_maps *old_maps, int tci, u16 index)
 {
 	struct xps_map *map = NULL;
 	int pos;
@@ -2479,6 +2479,8 @@ static bool remove_xps_queue(struct xps_dev_maps *dev_maps,
 			break;
 		}
 
+		if (old_maps)
+			RCU_INIT_POINTER(old_maps->attr_map[tci], NULL);
 		RCU_INIT_POINTER(dev_maps->attr_map[tci], NULL);
 		kfree_rcu(map, rcu);
 		return false;
@@ -2499,7 +2501,7 @@ static bool remove_xps_queue_cpu(struct net_device *dev,
 		int i, j;
 
 		for (i = count, j = offset; i--; j++) {
-			if (!remove_xps_queue(dev_maps, tci, j))
+			if (!remove_xps_queue(dev_maps, NULL, tci, j))
 				break;
 		}
 
@@ -2631,7 +2633,7 @@ static void xps_copy_dev_maps(struct xps_dev_maps *dev_maps,
 int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			  u16 index, enum xps_map_type type)
 {
-	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL;
+	struct xps_dev_maps *dev_maps, *new_dev_maps = NULL, *old_dev_maps = NULL;
 	const unsigned long *online_mask = NULL;
 	bool active = false, copy = false;
 	int i, j, tci, numa_node_id = -2;
@@ -2768,7 +2770,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		}
 	}
 
-	kfree_rcu(dev_maps, rcu);
+	old_dev_maps = copy ? dev_maps : NULL;
 
 out_no_old_maps:
 	dev_maps = new_dev_maps;
@@ -2794,10 +2796,14 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 			    netif_attr_test_online(j, online_mask, dev_maps->nr_ids))
 				continue;
 
-			active |= remove_xps_queue(dev_maps, tci, index);
+			active |= remove_xps_queue(dev_maps, old_dev_maps, tci,
+						   index);
 		}
 	}
 
+	if (old_dev_maps)
+		kfree_rcu(old_dev_maps, rcu);
+
 	/* free map if not active */
 	if (!active)
 		reset_xps_maps(dev, dev_maps, type);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 14/16] net: NULL the old xps map entries when freeing them
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (12 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 13/16] net: fix use after free in xps Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue Antoine Tenart
  2021-03-12 15:04 ` [PATCH net-next v3 16/16] virtio_net: take the rtnl lock when calling virtnet_set_affinity Antoine Tenart
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck; +Cc: Antoine Tenart, netdev

In __netif_set_xps_queue, old map entries from the old dev_maps are
freed but their corresponding entry in the old dev_maps aren't NULLed.
Fix this.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 748e377c7fe3..4f1b38de00ac 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2766,6 +2766,7 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 					continue;
 			}
 
+			RCU_INIT_POINTER(dev_maps->attr_map[tci], NULL);
 			kfree_rcu(map, rcu);
 		}
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (13 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 14/16] net: NULL the old xps map entries when freeing them Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  2021-03-12 20:54   ` Saeed Mahameed
  2021-03-12 15:04 ` [PATCH net-next v3 16/16] virtio_net: take the rtnl lock when calling virtnet_set_affinity Antoine Tenart
  15 siblings, 1 reply; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck, saeedm; +Cc: Antoine Tenart, netdev

netif_set_xps_queue must be called with the rtnl lock taken, and this is
now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
lock conditionally, fix this by taking the rtnl lock all the time.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ec2fcb2a2977..96cba86b9f0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct net_device *netdev)
 
 int mlx5e_attach_netdev(struct mlx5e_priv *priv)
 {
-	const bool take_rtnl = priv->netdev->reg_state == NETREG_REGISTERED;
 	const struct mlx5e_profile *profile = priv->profile;
 	int max_nch;
 	int err;
@@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv *priv)
 	 * 2. Set our default XPS cpumask.
 	 * 3. Build the RQT.
 	 *
-	 * rtnl_lock is required by netif_set_real_num_*_queues in case the
-	 * netdev has been registered by this point (if this function was called
-	 * in the reload or resume flow).
+	 * rtnl_lock is required by netif_set_xps_queue.
 	 */
-	if (take_rtnl)
-		rtnl_lock();
+	rtnl_lock();
 	err = mlx5e_num_channels_changed(priv);
-	if (take_rtnl)
-		rtnl_unlock();
+	rtnl_unlock();
 	if (err)
 		goto out;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v3 16/16] virtio_net: take the rtnl lock when calling virtnet_set_affinity
  2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
                   ` (14 preceding siblings ...)
  2021-03-12 15:04 ` [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue Antoine Tenart
@ 2021-03-12 15:04 ` Antoine Tenart
  15 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-12 15:04 UTC (permalink / raw)
  To: davem, kuba, alexander.duyck, mst, jasowang; +Cc: Antoine Tenart, netdev

netif_set_xps_queue must be called with the rtnl lock taken, and this is
now enforced using ASSERT_RTNL(). In virtio_net, netif_set_xps_queue is
called by virtnet_set_affinity. As this function can be called from an
ethtool helper, we can't take the rtnl lock directly in it. Instead we
take the rtnl lock when calling virtnet_set_affinity when the rtnl lock
isn't taken already.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 drivers/net/virtio_net.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dde9bbcc5ff0..54d2277f6c98 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2027,7 +2027,9 @@ static int virtnet_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
 	struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
 						   node);
+	rtnl_lock();
 	virtnet_set_affinity(vi);
+	rtnl_unlock();
 	return 0;
 }
 
@@ -2035,7 +2037,9 @@ static int virtnet_cpu_dead(unsigned int cpu, struct hlist_node *node)
 {
 	struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
 						   node_dead);
+	rtnl_lock();
 	virtnet_set_affinity(vi);
+	rtnl_unlock();
 	return 0;
 }
 
@@ -2883,7 +2887,9 @@ static int init_vqs(struct virtnet_info *vi)
 		goto err_free;
 
 	get_online_cpus();
+	rtnl_lock();
 	virtnet_set_affinity(vi);
+	rtnl_unlock();
 	put_online_cpus();
 
 	return 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-12 15:04 ` [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue Antoine Tenart
@ 2021-03-12 20:54   ` Saeed Mahameed
  2021-03-15  8:38     ` Antoine Tenart
  2021-03-17  8:55     ` Antoine Tenart
  0 siblings, 2 replies; 22+ messages in thread
From: Saeed Mahameed @ 2021-03-12 20:54 UTC (permalink / raw)
  To: Antoine Tenart, davem, kuba, alexander.duyck, Tariq Toukan,
	Maxim Mykytianskyi
  Cc: netdev

On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
> netif_set_xps_queue must be called with the rtnl lock taken, and this
> is
> now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
> lock conditionally, fix this by taking the rtnl lock all the time.
> 
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index ec2fcb2a2977..96cba86b9f0d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
> net_device *netdev)
>  
>  int mlx5e_attach_netdev(struct mlx5e_priv *priv)
>  {
> -       const bool take_rtnl = priv->netdev->reg_state ==
> NETREG_REGISTERED;
>         const struct mlx5e_profile *profile = priv->profile;
>         int max_nch;
>         int err;
> @@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
> *priv)
>          * 2. Set our default XPS cpumask.
>          * 3. Build the RQT.
>          *
> -        * rtnl_lock is required by netif_set_real_num_*_queues in case
> the
> -        * netdev has been registered by this point (if this function
> was called
> -        * in the reload or resume flow).
> +        * rtnl_lock is required by netif_set_xps_queue.
>          */

There is a reason why it is conditional:
we had a bug in the past of double locking here:

[ 4255.283960] echo/644 is trying to acquire lock:

 [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]

 [ 4255.287264] 

 [ 4255.287264] but task is already holding lock:

 [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]

ipoib_vlan_add is called under rtnl and will eventually call 
mlx5e_attach_netdev, we don't have much control over this in mlx5
driver since the rdma stack provides a per-prepared netdev to attach to
our hw. maybe it is time we had a nested rtnl lock .. 

> -       if (take_rtnl)
> -               rtnl_lock();
> +       rtnl_lock();
>         err = mlx5e_num_channels_changed(priv);
> -       if (take_rtnl)
> -               rtnl_unlock();
> +       rtnl_unlock();
>         if (err)
>                 goto out;
>  



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-12 20:54   ` Saeed Mahameed
@ 2021-03-15  8:38     ` Antoine Tenart
  2021-03-15 14:53       ` Maxim Mikityanskiy
  2021-03-17  8:55     ` Antoine Tenart
  1 sibling, 1 reply; 22+ messages in thread
From: Antoine Tenart @ 2021-03-15  8:38 UTC (permalink / raw)
  To: Maxim Mykytianskyi, Saeed Mahameed, Tariq Toukan,
	alexander.duyck, davem, kuba
  Cc: netdev

Quoting Saeed Mahameed (2021-03-12 21:54:18)
> On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
> > netif_set_xps_queue must be called with the rtnl lock taken, and this
> > is
> > now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
> > lock conditionally, fix this by taking the rtnl lock all the time.
> > 
> > Signed-off-by: Antoine Tenart <atenart@kernel.org>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
> >  1 file changed, 3 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index ec2fcb2a2977..96cba86b9f0d 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
> > net_device *netdev)
> >  
> >  int mlx5e_attach_netdev(struct mlx5e_priv *priv)
> >  {
> > -       const bool take_rtnl = priv->netdev->reg_state ==
> > NETREG_REGISTERED;
> >         const struct mlx5e_profile *profile = priv->profile;
> >         int max_nch;
> >         int err;
> > @@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
> > *priv)
> >          * 2. Set our default XPS cpumask.
> >          * 3. Build the RQT.
> >          *
> > -        * rtnl_lock is required by netif_set_real_num_*_queues in case
> > the
> > -        * netdev has been registered by this point (if this function
> > was called
> > -        * in the reload or resume flow).
> > +        * rtnl_lock is required by netif_set_xps_queue.
> >          */
> 
> There is a reason why it is conditional:
> we had a bug in the past of double locking here:
> 
> [ 4255.283960] echo/644 is trying to acquire lock:
> 
>  [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
> mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
> 
>  [ 4255.287264] 
> 
>  [ 4255.287264] but task is already holding lock:
> 
>  [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
> ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
> 
> ipoib_vlan_add is called under rtnl and will eventually call 
> mlx5e_attach_netdev, we don't have much control over this in mlx5
> driver since the rdma stack provides a per-prepared netdev to attach to
> our hw. maybe it is time we had a nested rtnl lock .. 

Thanks for the explanation. So as you said, we can't based the locking
decision only on the driver own state / information...

What about `take_rtnl = !rtnl_is_locked();`?

Thanks!
Antoine

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-15  8:38     ` Antoine Tenart
@ 2021-03-15 14:53       ` Maxim Mikityanskiy
  2021-03-15 15:13         ` Antoine Tenart
  0 siblings, 1 reply; 22+ messages in thread
From: Maxim Mikityanskiy @ 2021-03-15 14:53 UTC (permalink / raw)
  To: Antoine Tenart, Saeed Mahameed, Tariq Toukan, alexander.duyck,
	davem, kuba
  Cc: netdev

On 2021-03-15 10:38, Antoine Tenart wrote:
> Quoting Saeed Mahameed (2021-03-12 21:54:18)
>> On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
>>> netif_set_xps_queue must be called with the rtnl lock taken, and this
>>> is
>>> now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
>>> lock conditionally, fix this by taking the rtnl lock all the time.
>>>
>>> Signed-off-by: Antoine Tenart <atenart@kernel.org>
>>> ---
>>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
>>>   1 file changed, 3 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index ec2fcb2a2977..96cba86b9f0d 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
>>> net_device *netdev)
>>>   
>>>   int mlx5e_attach_netdev(struct mlx5e_priv *priv)
>>>   {
>>> -       const bool take_rtnl = priv->netdev->reg_state ==
>>> NETREG_REGISTERED;
>>>          const struct mlx5e_profile *profile = priv->profile;
>>>          int max_nch;
>>>          int err;
>>> @@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
>>> *priv)
>>>           * 2. Set our default XPS cpumask.
>>>           * 3. Build the RQT.
>>>           *
>>> -        * rtnl_lock is required by netif_set_real_num_*_queues in case
>>> the
>>> -        * netdev has been registered by this point (if this function
>>> was called
>>> -        * in the reload or resume flow).
>>> +        * rtnl_lock is required by netif_set_xps_queue.
>>>           */
>>
>> There is a reason why it is conditional:
>> we had a bug in the past of double locking here:
>>
>> [ 4255.283960] echo/644 is trying to acquire lock:
>>
>>   [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
>> mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
>>
>>   [ 4255.287264]
>>
>>   [ 4255.287264] but task is already holding lock:
>>
>>   [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
>> ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
>>
>> ipoib_vlan_add is called under rtnl and will eventually call
>> mlx5e_attach_netdev, we don't have much control over this in mlx5
>> driver since the rdma stack provides a per-prepared netdev to attach to
>> our hw. maybe it is time we had a nested rtnl lock ..
> 
> Thanks for the explanation. So as you said, we can't based the locking
> decision only on the driver own state / information...
> 
> What about `take_rtnl = !rtnl_is_locked();`?

It won't work, because the lock may be taken by some other unrelated 
thread. By doing `if (!rtnl_is_locked()) rtnl_lock()` we defeat the 
purpose of the lock, because we will proceed to the critical section 
even if we should wait until some other thread releases the lock.

> Thanks!
> Antoine
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-15 14:53       ` Maxim Mikityanskiy
@ 2021-03-15 15:13         ` Antoine Tenart
  0 siblings, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-15 15:13 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Saeed Mahameed, Tariq Toukan,
	alexander.duyck, davem, kuba
  Cc: netdev

Quoting Maxim Mikityanskiy (2021-03-15 15:53:02)
> On 2021-03-15 10:38, Antoine Tenart wrote:
> > Quoting Saeed Mahameed (2021-03-12 21:54:18)
> >> There is a reason why it is conditional:
> >> we had a bug in the past of double locking here:
> >>
> >> [ 4255.283960] echo/644 is trying to acquire lock:
> >>
> >>   [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
> >> mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
> >>
> >>   [ 4255.287264]
> >>
> >>   [ 4255.287264] but task is already holding lock:
> >>
> >>   [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
> >> ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
> >>
> >> ipoib_vlan_add is called under rtnl and will eventually call
> >> mlx5e_attach_netdev, we don't have much control over this in mlx5
> >> driver since the rdma stack provides a per-prepared netdev to attach to
> >> our hw. maybe it is time we had a nested rtnl lock ..
> > 
> > Thanks for the explanation. So as you said, we can't based the locking
> > decision only on the driver own state / information...
> > 
> > What about `take_rtnl = !rtnl_is_locked();`?
> 
> It won't work, because the lock may be taken by some other unrelated 
> thread. By doing `if (!rtnl_is_locked()) rtnl_lock()` we defeat the 
> purpose of the lock, because we will proceed to the critical section 
> even if we should wait until some other thread releases the lock.

Ah, that's right...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
  2021-03-12 20:54   ` Saeed Mahameed
  2021-03-15  8:38     ` Antoine Tenart
@ 2021-03-17  8:55     ` Antoine Tenart
  1 sibling, 0 replies; 22+ messages in thread
From: Antoine Tenart @ 2021-03-17  8:55 UTC (permalink / raw)
  To: Maxim Mykytianskyi, Saeed Mahameed, Tariq Toukan,
	alexander.duyck, davem, kuba
  Cc: netdev

Quoting Saeed Mahameed (2021-03-12 21:54:18)
> On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
> > netif_set_xps_queue must be called with the rtnl lock taken, and this
> > is
> > now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
> > lock conditionally, fix this by taking the rtnl lock all the time.
> 
> There is a reason why it is conditional:
> we had a bug in the past of double locking here:
> 
> [ 4255.283960] echo/644 is trying to acquire lock:
> 
>  [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
> mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
> 
>  [ 4255.287264] 
> 
>  [ 4255.287264] but task is already holding lock:
> 
>  [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
> ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
> 
> ipoib_vlan_add is called under rtnl and will eventually call 
> mlx5e_attach_netdev, we don't have much control over this in mlx5
> driver since the rdma stack provides a per-prepared netdev to attach to
> our hw. maybe it is time we had a nested rtnl lock .. 

Not sure we want to add a nested rtnl lock because of xps. I'd like to
see other options first. Could be having a locking mechanism for xps not
relying on rtnl; if that's possible.

As for this series, patches 6, 15 (this one) and 16 are not linked to
and do not rely on the other patches. They're improvement or fixes for
already existing behaviours. The series already gained enough new
patches since v1 and I don't want to maintain it out-of-tree for too
long, so I'll resend it without patches 6, 15 and 16; and then we'll be
able to focus on the xps locking relationship with rtnl.

Antoine

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-03-17  8:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 15:04 [PATCH net-next v3 00/16] net: xps: improve the xps maps handling Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 01/16] net-sysfs: convert xps_cpus_show to bitmap_zalloc Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 02/16] net-sysfs: store the return of get_netdev_queue_index in an unsigned int Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 03/16] net-sysfs: make xps_cpus_show and xps_rxqs_show consistent Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 04/16] net: embed num_tc in the xps maps Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 05/16] net: embed nr_ids " Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 06/16] net: assert the rtnl lock is held when calling __netif_set_xps_queue Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 07/16] net: remove the xps possible_mask Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 08/16] net: move the xps maps to an array Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 09/16] net: add an helper to copy xps maps to the new dev_maps Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 10/16] net: improve queue removal readability in __netif_set_xps_queue Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 11/16] net-sysfs: move the rtnl unlock up in the xps show helpers Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 12/16] net-sysfs: move the xps cpus/rxqs retrieval in a common function Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 13/16] net: fix use after free in xps Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 14/16] net: NULL the old xps map entries when freeing them Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue Antoine Tenart
2021-03-12 20:54   ` Saeed Mahameed
2021-03-15  8:38     ` Antoine Tenart
2021-03-15 14:53       ` Maxim Mikityanskiy
2021-03-15 15:13         ` Antoine Tenart
2021-03-17  8:55     ` Antoine Tenart
2021-03-12 15:04 ` [PATCH net-next v3 16/16] virtio_net: take the rtnl lock when calling virtnet_set_affinity Antoine Tenart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.