All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
@ 2023-01-21  4:24 Yury Norov
  2023-01-21  4:24 ` [PATCH 1/9] lib/find: introduce find_nth_and_andnot_bit Yury Norov
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

cpumask_local_spread() currently checks local node for presence of i'th
CPU, and then if it finds nothing makes a flat search among all non-local
CPUs. We can do it better by checking CPUs per NUMA hops.

This has significant performance implications on NUMA machines, for example
when using NUMA-aware allocated memory together with NUMA-aware IRQ
affinity hints.

Performance tests from patch 8 of this series for mellanox network
driver show:

  TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
  Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
  
  +-------------------------+-----------+------------------+------------------+
  |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
  +-------------------------+-----------+------------------+------------------+
  | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
  +-------------------------+-----------+------------------+------------------+
  | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
  +-------------------------+-----------+------------------+------------------+
  | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
  +-------------------------+-----------+------------------+------------------+
  | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
  +-------------------------+-----------+------------------+------------------+
  
  Bottleneck in RX side is released, reached linerate (~1.8x speedup).
  ~30% less cpu util on TX.

This series was supposed to be included in v6.2, but that didn't happen. It
spent enough in -next without any issues, so I hope we'll finally see it
in v6.3.

I believe, the best way would be moving it with scheduler patches, but I'm
OK to try again with bitmap branch as well.

Tariq Toukan (1):
  net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity
    hints

Valentin Schneider (2):
  sched/topology: Introduce sched_numa_hop_mask()
  sched/topology: Introduce for_each_numa_hop_mask()

Yury Norov (6):
  lib/find: introduce find_nth_and_andnot_bit
  cpumask: introduce cpumask_nth_and_andnot
  sched: add sched_numa_find_nth_cpu()
  cpumask: improve on cpumask_local_spread() locality
  lib/cpumask: reorganize cpumask_local_spread() logic
  lib/cpumask: update comment for cpumask_local_spread()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++-
 include/linux/cpumask.h                      | 20 +++++
 include/linux/find.h                         | 33 +++++++
 include/linux/topology.h                     | 33 +++++++
 kernel/sched/topology.c                      | 90 ++++++++++++++++++++
 lib/cpumask.c                                | 52 ++++++-----
 lib/find_bit.c                               |  9 ++
 7 files changed, 230 insertions(+), 25 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/9] lib/find: introduce find_nth_and_andnot_bit
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 2/9] cpumask: introduce cpumask_nth_and_andnot Yury Norov
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

In the following patches the function is used to implement in-place bitmaps
traversing without storing intermediate result in temporary bitmaps.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
---
 include/linux/find.h | 33 +++++++++++++++++++++++++++++++++
 lib/find_bit.c       |  9 +++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index ccaf61a0f5fd..4647864a5ffd 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -22,6 +22,9 @@ unsigned long __find_nth_and_bit(const unsigned long *addr1, const unsigned long
 				unsigned long size, unsigned long n);
 unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long size, unsigned long n);
+unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
+					const unsigned long *addr3, unsigned long size,
+					unsigned long n);
 extern unsigned long _find_first_and_bit(const unsigned long *addr1,
 					 const unsigned long *addr2, unsigned long size);
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size);
@@ -255,6 +258,36 @@ unsigned long find_nth_andnot_bit(const unsigned long *addr1, const unsigned lon
 	return __find_nth_andnot_bit(addr1, addr2, size, n);
 }
 
+/**
+ * find_nth_and_andnot_bit - find N'th set bit in 2 memory regions,
+ *			     excluding those set in 3rd region
+ * @addr1: The 1st address to start the search at
+ * @addr2: The 2nd address to start the search at
+ * @addr3: The 3rd address to start the search at
+ * @size: The maximum number of bits to search
+ * @n: The number of set bit, which position is needed, counting from 0
+ *
+ * Returns the bit number of the N'th set bit.
+ * If no such, returns @size.
+ */
+static __always_inline
+unsigned long find_nth_and_andnot_bit(const unsigned long *addr1,
+					const unsigned long *addr2,
+					const unsigned long *addr3,
+					unsigned long size, unsigned long n)
+{
+	if (n >= size)
+		return size;
+
+	if (small_const_nbits(size)) {
+		unsigned long val =  *addr1 & *addr2 & (~*addr3) & GENMASK(size - 1, 0);
+
+		return val ? fns(val, n) : size;
+	}
+
+	return __find_nth_and_andnot_bit(addr1, addr2, addr3, size, n);
+}
+
 #ifndef find_first_and_bit
 /**
  * find_first_and_bit - find the first set bit in both memory regions
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 18bc0a7ac8ee..c10920e66788 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -155,6 +155,15 @@ unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsigned l
 }
 EXPORT_SYMBOL(__find_nth_andnot_bit);
 
+unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1,
+					const unsigned long *addr2,
+					const unsigned long *addr3,
+					unsigned long size, unsigned long n)
+{
+	return FIND_NTH_BIT(addr1[idx] & addr2[idx] & ~addr3[idx], size, n);
+}
+EXPORT_SYMBOL(__find_nth_and_andnot_bit);
+
 #ifndef find_next_and_bit
 unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/9] cpumask: introduce cpumask_nth_and_andnot
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
  2023-01-21  4:24 ` [PATCH 1/9] lib/find: introduce find_nth_and_andnot_bit Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

Introduce cpumask_nth_and_andnot() based on find_nth_and_andnot_bit().
It's used in the following patch to traverse cpumasks without storing
intermediate result in temporary cpumask.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
---
 include/linux/cpumask.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2aa0aa26b45..7b16aede7ac5 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -391,6 +391,26 @@ unsigned int cpumask_nth_andnot(unsigned int cpu, const struct cpumask *srcp1,
 				nr_cpumask_bits, cpumask_check(cpu));
 }
 
+/**
+ * cpumask_nth_and_andnot - get the Nth cpu set in 1st and 2nd cpumask, and clear in 3rd.
+ * @srcp1: the cpumask pointer
+ * @srcp2: the cpumask pointer
+ * @srcp3: the cpumask pointer
+ * @cpu: the N'th cpu to find, starting from 0
+ *
+ * Returns >= nr_cpu_ids if such cpu doesn't exist.
+ */
+static __always_inline
+unsigned int cpumask_nth_and_andnot(unsigned int cpu, const struct cpumask *srcp1,
+							const struct cpumask *srcp2,
+							const struct cpumask *srcp3)
+{
+	return find_nth_and_andnot_bit(cpumask_bits(srcp1),
+					cpumask_bits(srcp2),
+					cpumask_bits(srcp3),
+					nr_cpumask_bits, cpumask_check(cpu));
+}
+
 #define CPU_BITS_NONE						\
 {								\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL			\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
  2023-01-21  4:24 ` [PATCH 1/9] lib/find: introduce find_nth_and_andnot_bit Yury Norov
  2023-01-21  4:24 ` [PATCH 2/9] cpumask: introduce cpumask_nth_and_andnot Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-02-03  0:58   ` Chen Yu
                     ` (2 more replies)
  2023-01-21  4:24 ` [PATCH 4/9] cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (8 subsequent siblings)
  11 siblings, 3 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

The function finds Nth set CPU in a given cpumask starting from a given
node.

Leveraging the fact that each hop in sched_domains_numa_masks includes the
same or greater number of CPUs than the previous one, we can use binary
search on hops instead of linear walk, which makes the overall complexity
of O(log n) in terms of number of cpumask_weight() calls.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
---
 include/linux/topology.h |  8 ++++++
 kernel/sched/topology.c  | 57 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4564faafd0e1..72f264575698 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -245,5 +245,13 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 	return cpumask_of_node(cpu_to_node(cpu));
 }
 
+#ifdef CONFIG_NUMA
+int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
+#else
+static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
+{
+	return cpumask_nth(cpu, cpus);
+}
+#endif	/* CONFIG_NUMA */
 
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54e..2bf89186a10f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -3,6 +3,8 @@
  * Scheduler topology setup/handling methods
  */
 
+#include <linux/bsearch.h>
+
 DEFINE_MUTEX(sched_domains_mutex);
 
 /* Protected by sched_domains_mutex: */
@@ -2067,6 +2069,61 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
 	return found;
 }
 
+struct __cmp_key {
+	const struct cpumask *cpus;
+	struct cpumask ***masks;
+	int node;
+	int cpu;
+	int w;
+};
+
+static int hop_cmp(const void *a, const void *b)
+{
+	struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
+	struct cpumask **cur_hop = *(struct cpumask ***)b;
+	struct __cmp_key *k = (struct __cmp_key *)a;
+
+	if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
+		return 1;
+
+	k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
+	if (k->w <= k->cpu)
+		return 0;
+
+	return -1;
+}
+
+/*
+ * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu
+ *                             closest to @cpu from @cpumask.
+ * cpumask: cpumask to find a cpu from
+ * cpu: Nth cpu to find
+ *
+ * returns: cpu, or nr_cpu_ids when nothing found.
+ */
+int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
+{
+	struct __cmp_key k = { .cpus = cpus, .node = node, .cpu = cpu };
+	struct cpumask ***hop_masks;
+	int hop, ret = nr_cpu_ids;
+
+	rcu_read_lock();
+
+	k.masks = rcu_dereference(sched_domains_numa_masks);
+	if (!k.masks)
+		goto unlock;
+
+	hop_masks = bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]), hop_cmp);
+	hop = hop_masks	- k.masks;
+
+	ret = hop ?
+		cpumask_nth_and_andnot(cpu - k.w, cpus, k.masks[hop][node], k.masks[hop-1][node]) :
+		cpumask_nth_and(cpu, cpus, k.masks[0][node]);
+unlock:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu);
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/9] cpumask: improve on cpumask_local_spread() locality
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (2 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 5/9] lib/cpumask: reorganize cpumask_local_spread() logic Yury Norov
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

Switch cpumask_local_spread() to use newly added sched_numa_find_nth_cpu(),
which takes into account distances to each node in the system.

For the following NUMA configuration:

root@debian:~# numactl -H
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3
node 0 size: 3869 MB
node 0 free: 3740 MB
node 1 cpus: 4 5
node 1 size: 1969 MB
node 1 free: 1937 MB
node 2 cpus: 6 7
node 2 size: 1967 MB
node 2 free: 1873 MB
node 3 cpus: 8 9 10 11 12 13 14 15
node 3 size: 7842 MB
node 3 free: 7723 MB
node distances:
node   0   1   2   3
  0:  10  50  30  70
  1:  50  10  70  30
  2:  30  70  10  50
  3:  70  30  50  10

The new cpumask_local_spread() traverses cpus for each node like this:

node 0:   0   1   2   3   6   7   4   5   8   9  10  11  12  13  14  15
node 1:   4   5   8   9  10  11  12  13  14  15   0   1   2   3   6   7
node 2:   6   7   0   1   2   3   8   9  10  11  12  13  14  15   4   5
node 3:   8   9  10  11  12  13  14  15   4   5   6   7   0   1   2   3

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
---
 lib/cpumask.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index c7c392514fd3..255974cd6734 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -110,7 +110,7 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
 #endif
 
 /**
- * cpumask_local_spread - select the i'th cpu with local numa cpu's first
+ * cpumask_local_spread - select the i'th cpu based on NUMA distances
  * @i: index number
  * @node: local numa_node
  *
@@ -132,15 +132,7 @@ unsigned int cpumask_local_spread(unsigned int i, int node)
 		if (cpu < nr_cpu_ids)
 			return cpu;
 	} else {
-		/* NUMA first. */
-		cpu = cpumask_nth_and(i, cpu_online_mask, cpumask_of_node(node));
-		if (cpu < nr_cpu_ids)
-			return cpu;
-
-		i -= cpumask_weight_and(cpu_online_mask, cpumask_of_node(node));
-
-		/* Skip NUMA nodes, done above. */
-		cpu = cpumask_nth_andnot(i, cpu_online_mask, cpumask_of_node(node));
+		cpu = sched_numa_find_nth_cpu(cpu_online_mask, i, node);
 		if (cpu < nr_cpu_ids)
 			return cpu;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/9] lib/cpumask: reorganize cpumask_local_spread() logic
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (3 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 4/9] cpumask: improve on cpumask_local_spread() locality Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 6/9] sched/topology: Introduce sched_numa_hop_mask() Yury Norov
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

Now after moving all NUMA logic into sched_numa_find_nth_cpu(),
else-branch of cpumask_local_spread() is just a function call, and
we can simplify logic by using ternary operator.

While here, replace BUG() with WARN_ON().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
---
 lib/cpumask.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index 255974cd6734..10aa15715c0d 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -127,16 +127,12 @@ unsigned int cpumask_local_spread(unsigned int i, int node)
 	/* Wrap: we always want a cpu. */
 	i %= num_online_cpus();
 
-	if (node == NUMA_NO_NODE) {
-		cpu = cpumask_nth(i, cpu_online_mask);
-		if (cpu < nr_cpu_ids)
-			return cpu;
-	} else {
-		cpu = sched_numa_find_nth_cpu(cpu_online_mask, i, node);
-		if (cpu < nr_cpu_ids)
-			return cpu;
-	}
-	BUG();
+	cpu = (node == NUMA_NO_NODE) ?
+		cpumask_nth(i, cpu_online_mask) :
+		sched_numa_find_nth_cpu(cpu_online_mask, i, node);
+
+	WARN_ON(cpu >= nr_cpu_ids);
+	return cpu;
 }
 EXPORT_SYMBOL(cpumask_local_spread);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/9] sched/topology: Introduce sched_numa_hop_mask()
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (4 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 5/9] lib/cpumask: reorganize cpumask_local_spread() logic Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 7/9] sched/topology: Introduce for_each_numa_hop_mask() Yury Norov
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

From: Valentin Schneider <vschneid@redhat.com>

Tariq has pointed out that drivers allocating IRQ vectors would benefit
from having smarter NUMA-awareness - cpumask_local_spread() only knows
about the local node and everything outside is in the same bucket.

sched_domains_numa_masks is pretty much what we want to hand out (a cpumask
of CPUs reachable within a given distance budget), introduce
sched_numa_hop_mask() to export those cpumasks.

Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h |  7 +++++++
 kernel/sched/topology.c  | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 72f264575698..344c2362755a 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -247,11 +247,18 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
+extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 {
 	return cpumask_nth(cpu, cpus);
 }
+
+static inline const struct cpumask *
+sched_numa_hop_mask(unsigned int node, unsigned int hops)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
 #endif	/* CONFIG_NUMA */
 
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 2bf89186a10f..1233affc106c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2124,6 +2124,39 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu);
+
+/**
+ * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
+ *                         @node
+ * @node: The node to count hops from.
+ * @hops: Include CPUs up to that many hops away. 0 means local node.
+ *
+ * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
+ * @node, an error value otherwise.
+ *
+ * Requires rcu_lock to be held. Returned cpumask is only valid within that
+ * read-side section, copy it if required beyond that.
+ *
+ * Note that not all hops are equal in distance; see sched_init_numa() for how
+ * distances and masks are handled.
+ * Also note that this is a reflection of sched_domains_numa_masks, which may change
+ * during the lifetime of the system (offline nodes are taken out of the masks).
+ */
+const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
+{
+	struct cpumask ***masks;
+
+	if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
+		return ERR_PTR(-EINVAL);
+
+	masks = rcu_dereference(sched_domains_numa_masks);
+	if (!masks)
+		return ERR_PTR(-EBUSY);
+
+	return masks[hops][node];
+}
+EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
+
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/9] sched/topology: Introduce for_each_numa_hop_mask()
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (5 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 6/9] sched/topology: Introduce sched_numa_hop_mask() Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Yury Norov
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

From: Valentin Schneider <vschneid@redhat.com>

The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
reachable within a given distance budget, wrap the logic for iterating over
all (distance, mask) values inside an iterator macro.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 344c2362755a..fea32377f7c7 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -261,4 +261,22 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
 }
 #endif	/* CONFIG_NUMA */
 
+/**
+ * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance
+ *                          from a given node.
+ * @mask: the iteration variable.
+ * @node: the NUMA node to start the search from.
+ *
+ * Requires rcu_lock to be held.
+ *
+ * Yields cpu_online_mask for @node == NUMA_NO_NODE.
+ */
+#define for_each_numa_hop_mask(mask, node)				       \
+	for (unsigned int __hops = 0;					       \
+	     mask = (node != NUMA_NO_NODE || __hops) ?			       \
+		     sched_numa_hop_mask(node, __hops) :		       \
+		     cpu_online_mask,					       \
+	     !IS_ERR_OR_NULL(mask);					       \
+	     __hops++)
+
 #endif /* _LINUX_TOPOLOGY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (6 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 7/9] sched/topology: Introduce for_each_numa_hop_mask() Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-21  4:24 ` [PATCH 9/9] lib/cpumask: update comment for cpumask_local_spread() Yury Norov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

From: Tariq Toukan <tariqt@nvidia.com>

In the IRQ affinity hints, replace the binary NUMA preference (local /
remote) with the improved for_each_numa_hop_cpu() API that minds the
actual distances, so that remote NUMAs with short distance are preferred
over farther ones.

This has significant performance implications when using NUMA-aware
allocated memory (follow [1] and derivatives for example).

[1]
drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
   int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));

Performance tests:

TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121

+-------------------------+-----------+------------------+------------------+
|                         | BW (Gbps) | TX side CPU util | RX side CPU util |
+-------------------------+-----------+------------------+------------------+
| Baseline                | 52.3      | 6.4 %            | 17.9 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
+-------------------------+-----------+------------------+------------------+

Bottleneck in RX side is released, reached linerate (~1.8x speedup).
~30% less cpu util on TX.

* CPU util on active cores only.

Setups details (similar for both sides):

NIC: ConnectX6-DX dual port, 100 Gbps each.
Single port used in the tests.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        16
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               1
Model name:          AMD EPYC 7763 64-Core Processor
Stepping:            1
CPU MHz:             2594.804
BogoMIPS:            4890.73
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            32768K
NUMA node0 CPU(s):   0-7,128-135
NUMA node1 CPU(s):   8-15,136-143
NUMA node2 CPU(s):   16-23,144-151
NUMA node3 CPU(s):   24-31,152-159
NUMA node4 CPU(s):   32-39,160-167
NUMA node5 CPU(s):   40-47,168-175
NUMA node6 CPU(s):   48-55,176-183
NUMA node7 CPU(s):   56-63,184-191
NUMA node8 CPU(s):   64-71,192-199
NUMA node9 CPU(s):   72-79,200-207
NUMA node10 CPU(s):  80-87,208-215
NUMA node11 CPU(s):  88-95,216-223
NUMA node12 CPU(s):  96-103,224-231
NUMA node13 CPU(s):  104-111,232-239
NUMA node14 CPU(s):  112-119,240-247
NUMA node15 CPU(s):  120-127,248-255
..

$ numactl -H
..
node distances:
node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  0:  10  11  11  11  12  12  12  12  32  32  32  32  32  32  32  32
  1:  11  10  11  11  12  12  12  12  32  32  32  32  32  32  32  32
  2:  11  11  10  11  12  12  12  12  32  32  32  32  32  32  32  32
  3:  11  11  11  10  12  12  12  12  32  32  32  32  32  32  32  32
  4:  12  12  12  12  10  11  11  11  32  32  32  32  32  32  32  32
  5:  12  12  12  12  11  10  11  11  32  32  32  32  32  32  32  32
  6:  12  12  12  12  11  11  10  11  32  32  32  32  32  32  32  32
  7:  12  12  12  12  11  11  11  10  32  32  32  32  32  32  32  32
  8:  32  32  32  32  32  32  32  32  10  11  11  11  12  12  12  12
  9:  32  32  32  32  32  32  32  32  11  10  11  11  12  12  12  12
 10:  32  32  32  32  32  32  32  32  11  11  10  11  12  12  12  12
 11:  32  32  32  32  32  32  32  32  11  11  11  10  12  12  12  12
 12:  32  32  32  32  32  32  32  32  12  12  12  12  10  11  11  11
 13:  32  32  32  32  32  32  32  32  12  12  12  12  11  10  11  11
 14:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  10  11
 15:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  11  10

$ cat /sys/class/net/ens5f0/device/numa_node
14

Affinity hints (127 IRQs):
Before:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010
352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020
353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040
354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200
357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400
358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800
359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000
360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000
362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000
363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000
365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000
366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000
368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000
369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000
370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000
371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000
372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000
373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000
374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000
375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000
376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000
378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000
379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000
380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000
381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000
382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000
383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000
384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000
385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000
386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000
387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000
388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000
389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000
390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000
391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000
392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000
393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000
394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000

After:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000
363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000
364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000
365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000
366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000
367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000
368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000
369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000
370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000
371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000
372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000
373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000
374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000
375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000
376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000
377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000
378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000
379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000
380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000
381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000
382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000
383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000
428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000
429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000
430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000
431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000
432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000
433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000
434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000
435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000
436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000
437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000
438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000
439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000
440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000
441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000
442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000
443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000
444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000
445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000
446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000
447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000
448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000
449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000
450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000
451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000
452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000
453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000
454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000
455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000
456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000
457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
[Tweaked API use]
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 8f7580fec193..c111e8657d54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -817,9 +817,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
 static int comp_irqs_request(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = dev->priv.eq_table;
+	const struct cpumask *prev = cpu_none_mask;
+	const struct cpumask *mask;
 	int ncomp_eqs = table->num_comp_eqs;
 	u16 *cpus;
 	int ret;
+	int cpu;
 	int i;
 
 	ncomp_eqs = table->num_comp_eqs;
@@ -838,8 +841,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
 		ret = -ENOMEM;
 		goto free_irqs;
 	}
-	for (i = 0; i < ncomp_eqs; i++)
-		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
+
+	i = 0;
+	rcu_read_lock();
+	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
+		for_each_cpu_andnot(cpu, mask, prev) {
+			cpus[i] = cpu;
+			if (++i == ncomp_eqs)
+				goto spread_done;
+		}
+		prev = mask;
+	}
+spread_done:
+	rcu_read_unlock();
 	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
 	kfree(cpus);
 	if (ret < 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 9/9] lib/cpumask: update comment for cpumask_local_spread()
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (7 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Yury Norov
@ 2023-01-21  4:24 ` Yury Norov
  2023-01-22 12:57 ` [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Tariq Toukan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-01-21  4:24 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: Yury Norov, linux-crypto, netdev, linux-rdma

Now that we have an iterator-based alternative for a very common case
of using cpumask_local_spread for all cpus in a row, it's worth to
mention that in comment to cpumask_local_spread().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 lib/cpumask.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index 10aa15715c0d..98291b07c756 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -114,11 +114,29 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
  * @i: index number
  * @node: local numa_node
  *
- * This function selects an online CPU according to a numa aware policy;
- * local cpus are returned first, followed by non-local ones, then it
- * wraps around.
+ * Returns online CPU according to a numa aware policy; local cpus are returned
+ * first, followed by non-local ones, then it wraps around.
  *
- * It's not very efficient, but useful for setup.
+ * For those who wants to enumerate all CPUs based on their NUMA distances,
+ * i.e. call this function in a loop, like:
+ *
+ * for (i = 0; i < num_online_cpus(); i++) {
+ *	cpu = cpumask_local_spread(i, node);
+ *	do_something(cpu);
+ * }
+ *
+ * There's a better alternative based on for_each()-like iterators:
+ *
+ *	for_each_numa_hop_mask(mask, node) {
+ *		for_each_cpu_andnot(cpu, mask, prev)
+ *			do_something(cpu);
+ *		prev = mask;
+ *	}
+ *
+ * It's simpler and more verbose than above. Complexity of iterator-based
+ * enumeration is O(sched_domains_numa_levels * nr_cpu_ids), while
+ * cpumask_local_spread() when called for each cpu is
+ * O(sched_domains_numa_levels * nr_cpu_ids * log(nr_cpu_ids)).
  */
 unsigned int cpumask_local_spread(unsigned int i, int node)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (8 preceding siblings ...)
  2023-01-21  4:24 ` [PATCH 9/9] lib/cpumask: update comment for cpumask_local_spread() Yury Norov
@ 2023-01-22 12:57 ` Tariq Toukan
  2023-01-23  9:57   ` Valentin Schneider
  2023-02-08  4:20 ` patchwork-bot+netdevbpf
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
  11 siblings, 1 reply; 36+ messages in thread
From: Tariq Toukan @ 2023-01-22 12:57 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, David S. Miller, Andy Shevchenko,
	Barry Song, Ben Segall, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jakub Kicinski, Jason Gunthorpe,
	Jesse Brandeburg, Jonathan Cameron, Juri Lelli, Leon Romanovsky,
	Linus Torvalds, Mel Gorman, Peter Lafreniere, Peter Zijlstra,
	Rasmus Villemoes, Saeed Mahameed, Steven Rostedt, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot
  Cc: linux-crypto, netdev, linux-rdma



On 21/01/2023 6:24, Yury Norov wrote:
> cpumask_local_spread() currently checks local node for presence of i'th
> CPU, and then if it finds nothing makes a flat search among all non-local
> CPUs. We can do it better by checking CPUs per NUMA hops.
> 
> This has significant performance implications on NUMA machines, for example
> when using NUMA-aware allocated memory together with NUMA-aware IRQ
> affinity hints.
> 
> Performance tests from patch 8 of this series for mellanox network
> driver show:
> 
>    TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
>    Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
>    
>    +-------------------------+-----------+------------------+------------------+
>    |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
>    +-------------------------+-----------+------------------+------------------+
>    | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
>    +-------------------------+-----------+------------------+------------------+
>    | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
>    +-------------------------+-----------+------------------+------------------+
>    | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
>    +-------------------------+-----------+------------------+------------------+
>    | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
>    +-------------------------+-----------+------------------+------------------+
>    
>    Bottleneck in RX side is released, reached linerate (~1.8x speedup).
>    ~30% less cpu util on TX.
> 
> This series was supposed to be included in v6.2, but that didn't happen. It
> spent enough in -next without any issues, so I hope we'll finally see it
> in v6.3.
> 
> I believe, the best way would be moving it with scheduler patches, but I'm
> OK to try again with bitmap branch as well.

Now that Yury dropped several controversial bitmap patches form the PR, 
the rest are mostly in sched, or new API that's used by sched.

Valentin, what do you think? Can you take it to your sched branch?

> 
> Tariq Toukan (1):
>    net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity
>      hints
> 
> Valentin Schneider (2):
>    sched/topology: Introduce sched_numa_hop_mask()
>    sched/topology: Introduce for_each_numa_hop_mask()
> 
> Yury Norov (6):
>    lib/find: introduce find_nth_and_andnot_bit
>    cpumask: introduce cpumask_nth_and_andnot
>    sched: add sched_numa_find_nth_cpu()
>    cpumask: improve on cpumask_local_spread() locality
>    lib/cpumask: reorganize cpumask_local_spread() logic
>    lib/cpumask: update comment for cpumask_local_spread()
> 
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++-
>   include/linux/cpumask.h                      | 20 +++++
>   include/linux/find.h                         | 33 +++++++
>   include/linux/topology.h                     | 33 +++++++
>   kernel/sched/topology.c                      | 90 ++++++++++++++++++++
>   lib/cpumask.c                                | 52 ++++++-----
>   lib/find_bit.c                               |  9 ++
>   7 files changed, 230 insertions(+), 25 deletions(-)
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-22 12:57 ` [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Tariq Toukan
@ 2023-01-23  9:57   ` Valentin Schneider
  2023-01-29  8:07     ` Tariq Toukan
  2023-02-08  2:25     ` Jakub Kicinski
  0 siblings, 2 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-01-23  9:57 UTC (permalink / raw)
  To: Tariq Toukan, Yury Norov, linux-kernel, David S. Miller,
	Andy Shevchenko, Barry Song, Ben Segall, Dietmar Eggemann,
	Gal Pressman, Greg Kroah-Hartman, Haniel Bristot de Oliveira,
	Heiko Carstens, Ingo Molnar, Jacob Keller, Jakub Kicinski,
	Jason Gunthorpe, Jesse Brandeburg, Jonathan Cameron, Juri Lelli,
	Leon Romanovsky, Linus Torvalds, Mel Gorman, Peter Lafreniere,
	Peter Zijlstra, Rasmus Villemoes, Saeed Mahameed, Steven Rostedt,
	Tariq Toukan, Tony Luck, Vincent Guittot
  Cc: linux-crypto, netdev, linux-rdma

On 22/01/23 14:57, Tariq Toukan wrote:
> On 21/01/2023 6:24, Yury Norov wrote:
>>
>> This series was supposed to be included in v6.2, but that didn't happen. It
>> spent enough in -next without any issues, so I hope we'll finally see it
>> in v6.3.
>>
>> I believe, the best way would be moving it with scheduler patches, but I'm
>> OK to try again with bitmap branch as well.
>
> Now that Yury dropped several controversial bitmap patches form the PR,
> the rest are mostly in sched, or new API that's used by sched.
>
> Valentin, what do you think? Can you take it to your sched branch?
>

I would if I had one :-)

Peter/Ingo, any objections to stashing this in tip/sched/core?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-23  9:57   ` Valentin Schneider
@ 2023-01-29  8:07     ` Tariq Toukan
  2023-01-30 20:22       ` Jakub Kicinski
  2023-02-08  2:25     ` Jakub Kicinski
  1 sibling, 1 reply; 36+ messages in thread
From: Tariq Toukan @ 2023-01-29  8:07 UTC (permalink / raw)
  To: Valentin Schneider, Yury Norov, linux-kernel, David S. Miller,
	Andy Shevchenko, Barry Song, Ben Segall, Dietmar Eggemann,
	Gal Pressman, Greg Kroah-Hartman, Haniel Bristot de Oliveira,
	Heiko Carstens, Ingo Molnar, Jacob Keller, Jakub Kicinski,
	Jason Gunthorpe, Jesse Brandeburg, Jonathan Cameron, Juri Lelli,
	Leon Romanovsky, Linus Torvalds, Mel Gorman, Peter Lafreniere,
	Peter Zijlstra, Rasmus Villemoes, Saeed Mahameed, Steven Rostedt,
	Tariq Toukan, Tony Luck, Vincent Guittot
  Cc: linux-crypto, netdev, linux-rdma



On 23/01/2023 11:57, Valentin Schneider wrote:
> On 22/01/23 14:57, Tariq Toukan wrote:
>> On 21/01/2023 6:24, Yury Norov wrote:
>>>
>>> This series was supposed to be included in v6.2, but that didn't happen. It
>>> spent enough in -next without any issues, so I hope we'll finally see it
>>> in v6.3.
>>>
>>> I believe, the best way would be moving it with scheduler patches, but I'm
>>> OK to try again with bitmap branch as well.
>>
>> Now that Yury dropped several controversial bitmap patches form the PR,
>> the rest are mostly in sched, or new API that's used by sched.
>>
>> Valentin, what do you think? Can you take it to your sched branch?
>>
> 
> I would if I had one :-)
> 

Oh I see :)

> Peter/Ingo, any objections to stashing this in tip/sched/core?
> 

Hi Peter and Ingo,

Can you please look into it? So we'll have enough time to act (in 
case...) during this kernel.

We already missed one kernel...

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-29  8:07     ` Tariq Toukan
@ 2023-01-30 20:22       ` Jakub Kicinski
  2023-02-02 17:33         ` Jakub Kicinski
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Kicinski @ 2023-01-30 20:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tariq Toukan, Valentin Schneider, Yury Norov, linux-kernel,
	David S. Miller, Andy Shevchenko, Barry Song, Ben Segall,
	Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Rasmus Villemoes, Saeed Mahameed,
	Steven Rostedt, Tariq Toukan, Tony Luck, Vincent Guittot,
	linux-crypto, netdev, linux-rdma

On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote:
> > Peter/Ingo, any objections to stashing this in tip/sched/core?
> 
> Can you please look into it? So we'll have enough time to act (in 
> case...) during this kernel.
> 
> We already missed one kernel...

We really need this in linux-next by the end of the week. PTAL.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-30 20:22       ` Jakub Kicinski
@ 2023-02-02 17:33         ` Jakub Kicinski
  2023-02-02 17:37           ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Kicinski @ 2023-02-02 17:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tariq Toukan, Valentin Schneider, Yury Norov, linux-kernel,
	David S. Miller, Andy Shevchenko, Barry Song, Ben Segall,
	Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Rasmus Villemoes, Saeed Mahameed,
	Steven Rostedt, Tariq Toukan, Tony Luck, Vincent Guittot,
	linux-crypto, netdev, linux-rdma

On Mon, 30 Jan 2023 12:22:06 -0800 Jakub Kicinski wrote:
> On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote:
> > > Peter/Ingo, any objections to stashing this in tip/sched/core?  
> > 
> > Can you please look into it? So we'll have enough time to act (in 
> > case...) during this kernel.
> > 
> > We already missed one kernel...  
> 
> We really need this in linux-next by the end of the week. PTAL.

Peter, could you please take a look? Linux doesn't have an API for
basic, common sense IRQ distribution on AMD systems. It's important :(

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-02-02 17:33         ` Jakub Kicinski
@ 2023-02-02 17:37           ` Yury Norov
  0 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-02-02 17:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Peter Zijlstra, Tariq Toukan, Valentin Schneider, linux-kernel,
	David S. Miller, Andy Shevchenko, Barry Song, Ben Segall,
	Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Rasmus Villemoes, Saeed Mahameed,
	Steven Rostedt, Tariq Toukan, Tony Luck, Vincent Guittot,
	linux-crypto, netdev, linux-rdma

On Thu, Feb 2, 2023 at 9:33 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 30 Jan 2023 12:22:06 -0800 Jakub Kicinski wrote:
> > On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote:
> > > > Peter/Ingo, any objections to stashing this in tip/sched/core?
> > >
> > > Can you please look into it? So we'll have enough time to act (in
> > > case...) during this kernel.
> > >
> > > We already missed one kernel...
> >
> > We really need this in linux-next by the end of the week. PTAL.
>
> Peter, could you please take a look? Linux doesn't have an API for
> basic, common sense IRQ distribution on AMD systems. It's important :(

FWIW, it's already been in linux-next since mid-December through the
bitmap branch, and no issues were reported so far.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
@ 2023-02-03  0:58   ` Chen Yu
  2023-02-07  5:09   ` Jakub Kicinski
  2023-02-17  1:39   ` Yury Norov
  2 siblings, 0 replies; 36+ messages in thread
From: Chen Yu @ 2023-02-03  0:58 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Dietmar Eggemann, Gal Pressman, Greg Kroah-Hartman,
	Haniel Bristot de Oliveira, Heiko Carstens, Ingo Molnar,
	Jacob Keller, Jakub Kicinski, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Valentin Schneider, Vincent Guittot, linux-crypto,
	netdev, linux-rdma

On 2023-01-20 at 20:24:30 -0800, Yury Norov wrote:
> The function finds Nth set CPU in a given cpumask starting from a given
> node.
> 
> Leveraging the fact that each hop in sched_domains_numa_masks includes the
> same or greater number of CPUs than the previous one, we can use binary
> search on hops instead of linear walk, which makes the overall complexity
> of O(log n) in terms of number of cpumask_weight() calls.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Peter Lafreniere <peter@n8pjl.ca>
> ---
>  include/linux/topology.h |  8 ++++++
>  kernel/sched/topology.c  | 57 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
[snip] 
> + * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu
> + *                             closest to @cpu from @cpumask.
Just minor question, the @cpu below is used as the index, right? What does "close to @cpu"
mean above?
> + * cpumask: cpumask to find a cpu from
> + * cpu: Nth cpu to find
Maybe also add description for @node?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
  2023-02-03  0:58   ` Chen Yu
@ 2023-02-07  5:09   ` Jakub Kicinski
  2023-02-07 10:29     ` Valentin Schneider
  2023-02-17  1:39   ` Yury Norov
  2 siblings, 1 reply; 36+ messages in thread
From: Jakub Kicinski @ 2023-02-07  5:09 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Yury Norov, linux-kernel, David S. Miller, Andy Shevchenko,
	Barry Song, Ben Segall, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Vincent Guittot, linux-crypto, netdev, linux-rdma

On Fri, 20 Jan 2023 20:24:30 -0800 Yury Norov wrote:
> The function finds Nth set CPU in a given cpumask starting from a given
> node.
> 
> Leveraging the fact that each hop in sched_domains_numa_masks includes the
> same or greater number of CPUs than the previous one, we can use binary
> search on hops instead of linear walk, which makes the overall complexity
> of O(log n) in terms of number of cpumask_weight() calls.

Valentin, would you be willing to give us a SoB or Review tag for 
this one?  We'd like to take the whole series via networking, if 
that's okay.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-02-07  5:09   ` Jakub Kicinski
@ 2023-02-07 10:29     ` Valentin Schneider
  0 siblings, 0 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-02-07 10:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yury Norov, linux-kernel, David S. Miller, Andy Shevchenko,
	Barry Song, Ben Segall, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Leon Romanovsky, Linus Torvalds,
	Mel Gorman, Peter Lafreniere, Peter Zijlstra, Rasmus Villemoes,
	Saeed Mahameed, Steven Rostedt, Tariq Toukan, Tariq Toukan,
	Tony Luck, Vincent Guittot, linux-crypto, netdev, linux-rdma

On 06/02/23 21:09, Jakub Kicinski wrote:
> On Fri, 20 Jan 2023 20:24:30 -0800 Yury Norov wrote:
>> The function finds Nth set CPU in a given cpumask starting from a given
>> node.
>>
>> Leveraging the fact that each hop in sched_domains_numa_masks includes the
>> same or greater number of CPUs than the previous one, we can use binary
>> search on hops instead of linear walk, which makes the overall complexity
>> of O(log n) in terms of number of cpumask_weight() calls.
>
> Valentin, would you be willing to give us a SoB or Review tag for
> this one?  We'd like to take the whole series via networking, if
> that's okay.

Sure, feel free to add

  Reviewed-by: Valentin Schneider <vschneid@redhat.com>

to patches that don't already have it.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-23  9:57   ` Valentin Schneider
  2023-01-29  8:07     ` Tariq Toukan
@ 2023-02-08  2:25     ` Jakub Kicinski
  1 sibling, 0 replies; 36+ messages in thread
From: Jakub Kicinski @ 2023-02-08  2:25 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Tariq Toukan, Yury Norov, linux-kernel, David S. Miller,
	Andy Shevchenko, Barry Song, Ben Segall, Dietmar Eggemann,
	Gal Pressman, Greg Kroah-Hartman, Haniel Bristot de Oliveira,
	Heiko Carstens, Ingo Molnar, Jacob Keller, Jason Gunthorpe,
	Jesse Brandeburg, Jonathan Cameron, Juri Lelli, Leon Romanovsky,
	Linus Torvalds, Mel Gorman, Peter Lafreniere, Peter Zijlstra,
	Rasmus Villemoes, Saeed Mahameed, Steven Rostedt, Tariq Toukan,
	Tony Luck, Vincent Guittot, linux-crypto, netdev, linux-rdma

On Mon, 23 Jan 2023 09:57:43 +0000 Valentin Schneider wrote:
> On 22/01/23 14:57, Tariq Toukan wrote:
> > On 21/01/2023 6:24, Yury Norov wrote:  
> >>
> >> This series was supposed to be included in v6.2, but that didn't happen. It
> >> spent enough in -next without any issues, so I hope we'll finally see it
> >> in v6.3.
> >>
> >> I believe, the best way would be moving it with scheduler patches, but I'm
> >> OK to try again with bitmap branch as well.  
> >
> > Now that Yury dropped several controversial bitmap patches form the PR,
> > the rest are mostly in sched, or new API that's used by sched.
> >
> > Valentin, what do you think? Can you take it to your sched branch?
>
> I would if I had one :-)
> 
> Peter/Ingo, any objections to stashing this in tip/sched/core?

No replies... so let me take it via networking.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (9 preceding siblings ...)
  2023-01-22 12:57 ` [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Tariq Toukan
@ 2023-02-08  4:20 ` patchwork-bot+netdevbpf
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
  11 siblings, 0 replies; 36+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-02-08  4:20 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, davem, andriy.shevchenko, baohua, bsegall,
	dietmar.eggemann, gal, gregkh, bristot, hca, mingo,
	jacob.e.keller, kuba, jgg, jesse.brandeburg, Jonathan.Cameron,
	juri.lelli, leonro, torvalds, mgorman, peter, peterz, linux,
	saeedm, rostedt, tariqt, ttoukan.linux, tony.luck, vschneid,
	vincent.guittot, linux-crypto, netdev, linux-rdma

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 20 Jan 2023 20:24:27 -0800 you wrote:
> cpumask_local_spread() currently checks local node for presence of i'th
> CPU, and then if it finds nothing makes a flat search among all non-local
> CPUs. We can do it better by checking CPUs per NUMA hops.
> 
> This has significant performance implications on NUMA machines, for example
> when using NUMA-aware allocated memory together with NUMA-aware IRQ
> affinity hints.
> 
> [...]

Here is the summary with links:
  - [1/9] lib/find: introduce find_nth_and_andnot_bit
    https://git.kernel.org/netdev/net-next/c/43245117806f
  - [2/9] cpumask: introduce cpumask_nth_and_andnot
    https://git.kernel.org/netdev/net-next/c/62f4386e564d
  - [3/9] sched: add sched_numa_find_nth_cpu()
    https://git.kernel.org/netdev/net-next/c/cd7f55359c90
  - [4/9] cpumask: improve on cpumask_local_spread() locality
    https://git.kernel.org/netdev/net-next/c/406d394abfcd
  - [5/9] lib/cpumask: reorganize cpumask_local_spread() logic
    https://git.kernel.org/netdev/net-next/c/b1beed72b8b7
  - [6/9] sched/topology: Introduce sched_numa_hop_mask()
    https://git.kernel.org/netdev/net-next/c/9feae65845f7
  - [7/9] sched/topology: Introduce for_each_numa_hop_mask()
    https://git.kernel.org/netdev/net-next/c/06ac01721f7d
  - [8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
    https://git.kernel.org/netdev/net-next/c/2acda57736de
  - [9/9] lib/cpumask: update comment for cpumask_local_spread()
    https://git.kernel.org/netdev/net-next/c/2ac4980c57f5

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
                   ` (10 preceding siblings ...)
  2023-02-08  4:20 ` patchwork-bot+netdevbpf
@ 2023-02-08 15:39 ` Pawel Chmielewski
       [not found]   ` <CAH-L+nO+KyzPSX_F0fh+9i=0rW1hoBPFTGbXc1EX+4MGYOR1kA@mail.gmail.com>
                     ` (6 more replies)
  11 siblings, 7 replies; 36+ messages in thread
From: Pawel Chmielewski @ 2023-02-08 15:39 UTC (permalink / raw)
  To: yury.norov
  Cc: Jonathan.Cameron, andriy.shevchenko, baohua, bristot, bsegall,
	davem, dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid, Pawel Chmielewski

With the introduction of sched_numa_hop_mask() and
for_each_numa_hop_mask(), the affinity masks for queue vectors can be
conveniently set by preferring the CPUs that are closest to the NUMA node
of the parent PCI device.

Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index e864634d66bc..fd3550d15c9e 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -122,8 +122,6 @@ static int ice_vsi_alloc_q_vector(struct ice_vsi *vsi, u16 v_idx)
 	if (vsi->type == ICE_VSI_VF)
 		goto out;
 	/* only set affinity_mask if the CPU is online */
-	if (cpu_online(v_idx))
-		cpumask_set_cpu(v_idx, &q_vector->affinity_mask);
 
 	/* This will not be called in the driver load path because the netdev
 	 * will not be created yet. All other cases with register the NAPI
@@ -659,8 +657,10 @@ int ice_vsi_wait_one_rx_ring(struct ice_vsi *vsi, bool ena, u16 rxq_idx)
  */
 int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 {
+	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
 	struct device *dev = ice_pf_to_dev(vsi->back);
-	u16 v_idx;
+	int numa_node = dev->numa_node;
+	u16 v_idx, cpu = 0;
 	int err;
 
 	if (vsi->q_vectors[0]) {
@@ -674,6 +674,17 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 			goto err_out;
 	}
 
+	v_idx = 0;
+	for_each_numa_hop_mask(aff_mask, numa_node) {
+		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
+			if (v_idx < vsi->num_q_vectors) {
+				if (cpu_online(cpu))
+					cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
+				v_idx++;
+			}
+		last_aff_mask = aff_mask;
+	}
+
 	return 0;
 
 err_out:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
       [not found]   ` <CAH-L+nO+KyzPSX_F0fh+9i=0rW1hoBPFTGbXc1EX+4MGYOR1kA@mail.gmail.com>
@ 2023-02-08 16:08     ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2023-02-08 16:08 UTC (permalink / raw)
  To: Kalesh Anakkur Purayil
  Cc: Pawel Chmielewski, yury.norov, Jonathan.Cameron, baohua, bristot,
	bsegall, davem, dietmar.eggemann, gal, gregkh, hca,
	jacob.e.keller, jesse.brandeburg, jgg, juri.lelli, kuba, leonro,
	linux-crypto, linux-kernel, linux-rdma, linux, mgorman, mingo,
	netdev, peter, peterz, rostedt, saeedm, tariqt, tony.luck,
	torvalds, ttoukan.linux, vincent.guittot, vschneid

On Wed, Feb 08, 2023 at 09:18:14PM +0530, Kalesh Anakkur Purayil wrote:
> On Wed, Feb 8, 2023 at 9:11 PM Pawel Chmielewski <
> pawel.chmielewski@intel.com> wrote:

...

> > +       u16 v_idx, cpu = 0;
> >
> [Kalesh]: if you initialize v_idx to 0 here, you can avoid the assignment
> below

I would avoid doing this.

The problem is that it will become harder to maintain and more error prone
during development.

So, please leave as is.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
       [not found]   ` <CAH-L+nO+KyzPSX_F0fh+9i=0rW1hoBPFTGbXc1EX+4MGYOR1kA@mail.gmail.com>
@ 2023-02-08 16:39   ` Yury Norov
  2023-02-08 16:58     ` Andy Shevchenko
  2023-02-08 19:11   ` kernel test robot
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-02-08 16:39 UTC (permalink / raw)
  To: Pawel Chmielewski
  Cc: Jonathan.Cameron, andriy.shevchenko, baohua, bristot, bsegall,
	davem, dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid

On Wed, Feb 08, 2023 at 04:39:05PM +0100, Pawel Chmielewski wrote:
> With the introduction of sched_numa_hop_mask() and
> for_each_numa_hop_mask(), the affinity masks for queue vectors can be
> conveniently set by preferring the CPUs that are closest to the NUMA node
> of the parent PCI device.
> 
> Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_base.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index e864634d66bc..fd3550d15c9e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -122,8 +122,6 @@ static int ice_vsi_alloc_q_vector(struct ice_vsi *vsi, u16 v_idx)
>  	if (vsi->type == ICE_VSI_VF)
>  		goto out;
>  	/* only set affinity_mask if the CPU is online */
> -	if (cpu_online(v_idx))
> -		cpumask_set_cpu(v_idx, &q_vector->affinity_mask);
>  
>  	/* This will not be called in the driver load path because the netdev
>  	 * will not be created yet. All other cases with register the NAPI
> @@ -659,8 +657,10 @@ int ice_vsi_wait_one_rx_ring(struct ice_vsi *vsi, bool ena, u16 rxq_idx)
>   */
>  int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
>  {
> +	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
>  	struct device *dev = ice_pf_to_dev(vsi->back);
> -	u16 v_idx;
> +	int numa_node = dev->numa_node;
> +	u16 v_idx, cpu = 0;
>  	int err;
>  
>  	if (vsi->q_vectors[0]) {
> @@ -674,6 +674,17 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
>  			goto err_out;
>  	}
>  
> +	v_idx = 0;
> +	for_each_numa_hop_mask(aff_mask, numa_node) {
> +		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
> +			if (v_idx < vsi->num_q_vectors) {
> +				if (cpu_online(cpu))
> +					cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
> +				v_idx++;
> +			}
                        
                        else
                                goto out;

> +		last_aff_mask = aff_mask;
> +	}
> +
        out:

>  	return 0;
>  
>  err_out:
> -- 
> 2.37.3

Would it make sense to increment v_idx only if matched CPU is online?
It will create a less sparse array of vectors...

Thanks,
Yury

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 16:39   ` Yury Norov
@ 2023-02-08 16:58     ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2023-02-08 16:58 UTC (permalink / raw)
  To: Yury Norov
  Cc: Pawel Chmielewski, Jonathan.Cameron, baohua, bristot, bsegall,
	davem, dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid

On Wed, Feb 08, 2023 at 08:39:20AM -0800, Yury Norov wrote:
> On Wed, Feb 08, 2023 at 04:39:05PM +0100, Pawel Chmielewski wrote:

...

> > +	v_idx = 0;
> > +	for_each_numa_hop_mask(aff_mask, numa_node) {
> > +		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
> > +			if (v_idx < vsi->num_q_vectors) {
> > +				if (cpu_online(cpu))
> > +					cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
> > +				v_idx++;
> > +			}

>                         else
>                                 goto out;

In this case the inverted conditional makes more sense:

			if (v_idx >= vsi->num_q_vectors)
				goto out;

			if (cpu_online(cpu))
				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
			v_idx++;

(indentation level will be decreased).

> > +		last_aff_mask = aff_mask;
> > +	}
> > +
>         out:
> 
> >  	return 0;

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
       [not found]   ` <CAH-L+nO+KyzPSX_F0fh+9i=0rW1hoBPFTGbXc1EX+4MGYOR1kA@mail.gmail.com>
  2023-02-08 16:39   ` Yury Norov
@ 2023-02-08 19:11   ` kernel test robot
  2023-02-09  2:41     ` Philip Li
  2023-02-08 19:22   ` kernel test robot
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: kernel test robot @ 2023-02-08 19:11 UTC (permalink / raw)
  To: Pawel Chmielewski, yury.norov
  Cc: oe-kbuild-all, Jonathan.Cameron, andriy.shevchenko, baohua,
	bristot, bsegall, davem, dietmar.eggemann, gal, gregkh, hca,
	jacob.e.keller, jesse.brandeburg, jgg, juri.lelli, kuba, leonro,
	linux-crypto, linux-kernel, linux-rdma, linux, mgorman, mingo,
	netdev, peter, rostedt, saeedm, tariqt, tony.luck

Hi Pawel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tnguy-next-queue/dev-queue]
[also build test WARNING on linus/master v6.2-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
patch link:    https://lore.kernel.org/r/20230208153905.109912-1-pawel.chmielewski%40intel.com
patch subject: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230209/202302090307.GQOJ4jik-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
        git checkout 33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/20230208153905.109912-1-pawel.chmielewski@intel.com

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c: In function 'ice_vsi_alloc_q_vectors':
   drivers/net/ethernet/intel/ice/ice_base.c:678:9: error: implicit declaration of function 'for_each_numa_hop_mask'; did you mean 'for_each_node_mask'? [-Werror=implicit-function-declaration]
     678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |         ^~~~~~~~~~~~~~~~~~~~~~
         |         for_each_node_mask
   drivers/net/ethernet/intel/ice/ice_base.c:678:52: error: expected ';' before '{' token
     678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |                                                    ^~
         |                                                    ;
>> drivers/net/ethernet/intel/ice/ice_base.c:663:20: warning: unused variable 'cpu' [-Wunused-variable]
     663 |         u16 v_idx, cpu = 0;
         |                    ^~~
>> drivers/net/ethernet/intel/ice/ice_base.c:660:31: warning: unused variable 'last_aff_mask' [-Wunused-variable]
     660 |         cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
         |                               ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/cpu +663 drivers/net/ethernet/intel/ice/ice_base.c

   650	
   651	/**
   652	 * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
   653	 * @vsi: the VSI being configured
   654	 *
   655	 * We allocate one q_vector per queue interrupt. If allocation fails we
   656	 * return -ENOMEM.
   657	 */
   658	int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
   659	{
 > 660		cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
   661		struct device *dev = ice_pf_to_dev(vsi->back);
   662		int numa_node = dev->numa_node;
 > 663		u16 v_idx, cpu = 0;
   664		int err;
   665	
   666		if (vsi->q_vectors[0]) {
   667			dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
   668			return -EEXIST;
   669		}
   670	
   671		for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
   672			err = ice_vsi_alloc_q_vector(vsi, v_idx);
   673			if (err)
   674				goto err_out;
   675		}
   676	
   677		v_idx = 0;
 > 678		for_each_numa_hop_mask(aff_mask, numa_node) {
   679			for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
   680				if (v_idx < vsi->num_q_vectors) {
   681					if (cpu_online(cpu))
   682						cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
   683					v_idx++;
   684				}
   685			last_aff_mask = aff_mask;
   686		}
   687	
   688		return 0;
   689	
   690	err_out:
   691		while (v_idx--)
   692			ice_free_q_vector(vsi, v_idx);
   693	
   694		dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
   695			vsi->num_q_vectors, vsi->vsi_num, err);
   696		vsi->num_q_vectors = 0;
   697		return err;
   698	}
   699	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
                     ` (2 preceding siblings ...)
  2023-02-08 19:11   ` kernel test robot
@ 2023-02-08 19:22   ` kernel test robot
  2023-02-08 23:21   ` Jakub Kicinski
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-02-08 19:22 UTC (permalink / raw)
  To: Pawel Chmielewski, yury.norov
  Cc: oe-kbuild-all, Jonathan.Cameron, andriy.shevchenko, baohua,
	bristot, bsegall, davem, dietmar.eggemann, gal, gregkh, hca,
	jacob.e.keller, jesse.brandeburg, jgg, juri.lelli, kuba, leonro,
	linux-crypto, linux-kernel, linux-rdma, linux, mgorman, mingo,
	netdev, peter, peterz, rostedt, saeedm, tariqt, tony.luck

Hi Pawel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v6.2-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
patch link:    https://lore.kernel.org/r/20230208153905.109912-1-pawel.chmielewski%40intel.com
patch subject: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
config: arm-randconfig-r046-20230209 (https://download.01.org/0day-ci/archive/20230209/202302090331.7L2Y93PP-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
        git checkout 33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/net/ethernet/intel/ice/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/20230208153905.109912-1-pawel.chmielewski@intel.com

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c: In function 'ice_vsi_alloc_q_vectors':
>> drivers/net/ethernet/intel/ice/ice_base.c:662:28: error: 'struct device' has no member named 'numa_node'
     662 |         int numa_node = dev->numa_node;
         |                            ^~
>> drivers/net/ethernet/intel/ice/ice_base.c:678:9: error: implicit declaration of function 'for_each_numa_hop_mask'; did you mean 'for_each_node_mask'? [-Werror=implicit-function-declaration]
     678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |         ^~~~~~~~~~~~~~~~~~~~~~
         |         for_each_node_mask
>> drivers/net/ethernet/intel/ice/ice_base.c:678:52: error: expected ';' before '{' token
     678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |                                                    ^~
         |                                                    ;
   drivers/net/ethernet/intel/ice/ice_base.c:663:20: warning: unused variable 'cpu' [-Wunused-variable]
     663 |         u16 v_idx, cpu = 0;
         |                    ^~~
   drivers/net/ethernet/intel/ice/ice_base.c:660:31: warning: unused variable 'last_aff_mask' [-Wunused-variable]
     660 |         cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
         |                               ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +662 drivers/net/ethernet/intel/ice/ice_base.c

   650	
   651	/**
   652	 * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
   653	 * @vsi: the VSI being configured
   654	 *
   655	 * We allocate one q_vector per queue interrupt. If allocation fails we
   656	 * return -ENOMEM.
   657	 */
   658	int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
   659	{
   660		cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
   661		struct device *dev = ice_pf_to_dev(vsi->back);
 > 662		int numa_node = dev->numa_node;
   663		u16 v_idx, cpu = 0;
   664		int err;
   665	
   666		if (vsi->q_vectors[0]) {
   667			dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
   668			return -EEXIST;
   669		}
   670	
   671		for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
   672			err = ice_vsi_alloc_q_vector(vsi, v_idx);
   673			if (err)
   674				goto err_out;
   675		}
   676	
   677		v_idx = 0;
 > 678		for_each_numa_hop_mask(aff_mask, numa_node) {
   679			for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
   680				if (v_idx < vsi->num_q_vectors) {
   681					if (cpu_online(cpu))
   682						cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
   683					v_idx++;
   684				}
   685			last_aff_mask = aff_mask;
   686		}
   687	
   688		return 0;
   689	
   690	err_out:
   691		while (v_idx--)
   692			ice_free_q_vector(vsi, v_idx);
   693	
   694		dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
   695			vsi->num_q_vectors, vsi->vsi_num, err);
   696		vsi->num_q_vectors = 0;
   697		return err;
   698	}
   699	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
                     ` (3 preceding siblings ...)
  2023-02-08 19:22   ` kernel test robot
@ 2023-02-08 23:21   ` Jakub Kicinski
  2023-02-09  5:14   ` kernel test robot
  2023-02-16 14:54   ` [PATCH v2 " Pawel Chmielewski
  6 siblings, 0 replies; 36+ messages in thread
From: Jakub Kicinski @ 2023-02-08 23:21 UTC (permalink / raw)
  To: Pawel Chmielewski
  Cc: yury.norov, Jonathan.Cameron, andriy.shevchenko, baohua, bristot,
	bsegall, davem, dietmar.eggemann, gal, gregkh, hca,
	jacob.e.keller, jesse.brandeburg, jgg, juri.lelli, leonro,
	linux-crypto, linux-kernel, linux-rdma, linux, mgorman, mingo,
	netdev, peter, peterz, rostedt, saeedm, tariqt, tony.luck,
	torvalds, ttoukan.linux, vincent.guittot, vschneid

On Wed,  8 Feb 2023 16:39:05 +0100 Pawel Chmielewski wrote:
> With the introduction of sched_numa_hop_mask() and
> for_each_numa_hop_mask(), the affinity masks for queue vectors can be
> conveniently set by preferring the CPUs that are closest to the NUMA node
> of the parent PCI device.

Damn, you had this one locked and ready, didn't you.. :)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 19:11   ` kernel test robot
@ 2023-02-09  2:41     ` Philip Li
  0 siblings, 0 replies; 36+ messages in thread
From: Philip Li @ 2023-02-09  2:41 UTC (permalink / raw)
  To: kernel test robot
  Cc: Pawel Chmielewski, yury.norov, oe-kbuild-all, Jonathan.Cameron,
	andriy.shevchenko, baohua, bristot, bsegall, davem,
	dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	rostedt, saeedm, tariqt, tony.luck

On Thu, Feb 09, 2023 at 03:11:55AM +0800, kernel test robot wrote:
> Hi Pawel,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on tnguy-next-queue/dev-queue]
> [also build test WARNING on linus/master v6.2-rc7]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
> patch link:    https://lore.kernel.org/r/20230208153905.109912-1-pawel.chmielewski%40intel.com
> patch subject: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
> config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230209/202302090307.GQOJ4jik-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/intel-lab-lkp/linux/commit/33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
>         git checkout 33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 O=build_dir ARCH=x86_64 olddefconfig
>         make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Link: https://lore.kernel.org/oe-kbuild-all/20230208153905.109912-1-pawel.chmielewski@intel.com

Sorry that the link is wrong, which should be

	Link: https://lore.kernel.org/oe-kbuild-all/202302090307.GQOJ4jik-lkp@intel.com/

> 
> All warnings (new ones prefixed by >>):
> 
>    drivers/net/ethernet/intel/ice/ice_base.c: In function 'ice_vsi_alloc_q_vectors':
>    drivers/net/ethernet/intel/ice/ice_base.c:678:9: error: implicit declaration of function 'for_each_numa_hop_mask'; did you mean 'for_each_node_mask'? [-Werror=implicit-function-declaration]
>      678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
>          |         ^~~~~~~~~~~~~~~~~~~~~~
>          |         for_each_node_mask
>    drivers/net/ethernet/intel/ice/ice_base.c:678:52: error: expected ';' before '{' token
>      678 |         for_each_numa_hop_mask(aff_mask, numa_node) {
>          |                                                    ^~
>          |                                                    ;
> >> drivers/net/ethernet/intel/ice/ice_base.c:663:20: warning: unused variable 'cpu' [-Wunused-variable]
>      663 |         u16 v_idx, cpu = 0;
>          |                    ^~~
> >> drivers/net/ethernet/intel/ice/ice_base.c:660:31: warning: unused variable 'last_aff_mask' [-Wunused-variable]
>      660 |         cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
>          |                               ^~~~~~~~~~~~~
>    cc1: some warnings being treated as errors
> 
> 
> vim +/cpu +663 drivers/net/ethernet/intel/ice/ice_base.c
> 
>    650	
>    651	/**
>    652	 * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
>    653	 * @vsi: the VSI being configured
>    654	 *
>    655	 * We allocate one q_vector per queue interrupt. If allocation fails we
>    656	 * return -ENOMEM.
>    657	 */
>    658	int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
>    659	{
>  > 660		cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
>    661		struct device *dev = ice_pf_to_dev(vsi->back);
>    662		int numa_node = dev->numa_node;
>  > 663		u16 v_idx, cpu = 0;
>    664		int err;
>    665	
>    666		if (vsi->q_vectors[0]) {
>    667			dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
>    668			return -EEXIST;
>    669		}
>    670	
>    671		for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
>    672			err = ice_vsi_alloc_q_vector(vsi, v_idx);
>    673			if (err)
>    674				goto err_out;
>    675		}
>    676	
>    677		v_idx = 0;
>  > 678		for_each_numa_hop_mask(aff_mask, numa_node) {
>    679			for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
>    680				if (v_idx < vsi->num_q_vectors) {
>    681					if (cpu_online(cpu))
>    682						cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
>    683					v_idx++;
>    684				}
>    685			last_aff_mask = aff_mask;
>    686		}
>    687	
>    688		return 0;
>    689	
>    690	err_out:
>    691		while (v_idx--)
>    692			ice_free_q_vector(vsi, v_idx);
>    693	
>    694		dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
>    695			vsi->num_q_vectors, vsi->vsi_num, err);
>    696		vsi->num_q_vectors = 0;
>    697		return err;
>    698	}
>    699	
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
                     ` (4 preceding siblings ...)
  2023-02-08 23:21   ` Jakub Kicinski
@ 2023-02-09  5:14   ` kernel test robot
  2023-02-16 14:54   ` [PATCH v2 " Pawel Chmielewski
  6 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-02-09  5:14 UTC (permalink / raw)
  To: Pawel Chmielewski, yury.norov
  Cc: llvm, oe-kbuild-all, Jonathan.Cameron, andriy.shevchenko, baohua,
	bristot, bsegall, davem, dietmar.eggemann, gal, gregkh, hca,
	jacob.e.keller, jesse.brandeburg, jgg, juri.lelli, kuba, leonro,
	linux-crypto, linux-kernel, linux-rdma, linux, mgorman, mingo,
	netdev, peter, peterz, rostedt, saeedm, tariqt, tony.luck

Hi Pawel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v6.2-rc7 next-20230208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
patch link:    https://lore.kernel.org/r/20230208153905.109912-1-pawel.chmielewski%40intel.com
patch subject: [PATCH 1/1] ice: Change assigning method of the CPU affinity masks
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20230209/202302091302.PaJoE0gU-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pawel-Chmielewski/ice-Change-assigning-method-of-the-CPU-affinity-masks/20230208-234144
        git checkout 33971c3245ae75900dbc4cc9aa2b76ff9cdb534c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/ethernet/intel/ice/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302091302.PaJoE0gU-lkp@intel.com

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_base.c:662:23: error: no member named 'numa_node' in 'struct device'
           int numa_node = dev->numa_node;
                           ~~~  ^
>> drivers/net/ethernet/intel/ice/ice_base.c:678:2: error: implicit declaration of function 'for_each_numa_hop_mask' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
           for_each_numa_hop_mask(aff_mask, numa_node) {
           ^
>> drivers/net/ethernet/intel/ice/ice_base.c:678:45: error: expected ';' after expression
           for_each_numa_hop_mask(aff_mask, numa_node) {
                                                      ^
                                                      ;
   3 errors generated.


vim +662 drivers/net/ethernet/intel/ice/ice_base.c

   650	
   651	/**
   652	 * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
   653	 * @vsi: the VSI being configured
   654	 *
   655	 * We allocate one q_vector per queue interrupt. If allocation fails we
   656	 * return -ENOMEM.
   657	 */
   658	int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
   659	{
   660		cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
   661		struct device *dev = ice_pf_to_dev(vsi->back);
 > 662		int numa_node = dev->numa_node;
   663		u16 v_idx, cpu = 0;
   664		int err;
   665	
   666		if (vsi->q_vectors[0]) {
   667			dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
   668			return -EEXIST;
   669		}
   670	
   671		for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
   672			err = ice_vsi_alloc_q_vector(vsi, v_idx);
   673			if (err)
   674				goto err_out;
   675		}
   676	
   677		v_idx = 0;
 > 678		for_each_numa_hop_mask(aff_mask, numa_node) {
   679			for_each_cpu_andnot(cpu, aff_mask, last_aff_mask)
   680				if (v_idx < vsi->num_q_vectors) {
   681					if (cpu_online(cpu))
   682						cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
   683					v_idx++;
   684				}
   685			last_aff_mask = aff_mask;
   686		}
   687	
   688		return 0;
   689	
   690	err_out:
   691		while (v_idx--)
   692			ice_free_q_vector(vsi, v_idx);
   693	
   694		dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
   695			vsi->num_q_vectors, vsi->vsi_num, err);
   696		vsi->num_q_vectors = 0;
   697		return err;
   698	}
   699	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
                     ` (5 preceding siblings ...)
  2023-02-09  5:14   ` kernel test robot
@ 2023-02-16 14:54   ` Pawel Chmielewski
  2023-02-16 15:14     ` Andy Shevchenko
  6 siblings, 1 reply; 36+ messages in thread
From: Pawel Chmielewski @ 2023-02-16 14:54 UTC (permalink / raw)
  To: pawel.chmielewski
  Cc: Jonathan.Cameron, andriy.shevchenko, baohua, bristot, bsegall,
	davem, dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid, yury.norov

With the introduction of sched_numa_hop_mask() and for_each_numa_hop_mask(),
the affinity masks for queue vectors can be conveniently set by preferring the
CPUs that are closest to the NUMA node of the parent PCI device.

Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
---

Changes since v1:
 * Removed obsolete comment
 * Inverted condition for loop escape
 * Incrementing v_idx only in case of available cpu
---
 drivers/net/ethernet/intel/ice/ice_base.c | 24 +++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 9e36f01dfa4f..27b00d224c5d 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -121,9 +121,6 @@ static int ice_vsi_alloc_q_vector(struct ice_vsi *vsi, u16 v_idx)
 
 	if (vsi->type == ICE_VSI_VF)
 		goto out;
-	/* only set affinity_mask if the CPU is online */
-	if (cpu_online(v_idx))
-		cpumask_set_cpu(v_idx, &q_vector->affinity_mask);
 
 	/* This will not be called in the driver load path because the netdev
 	 * will not be created yet. All other cases with register the NAPI
@@ -659,8 +656,10 @@ int ice_vsi_wait_one_rx_ring(struct ice_vsi *vsi, bool ena, u16 rxq_idx)
  */
 int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 {
+	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
 	struct device *dev = ice_pf_to_dev(vsi->back);
-	u16 v_idx;
+	int numa_node = dev->numa_node;
+	u16 v_idx, cpu = 0;
 	int err;
 
 	if (vsi->q_vectors[0]) {
@@ -674,6 +673,23 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 			goto err_out;
 	}
 
+	v_idx = 0;
+
+	for_each_numa_hop_mask(aff_mask, numa_node) {
+		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
+			if (v_idx >= vsi->num_q_vectors)
+				goto out;
+
+			if (cpu_online(cpu)) {
+				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
+				v_idx++;
+			}
+		}
+
+		last_aff_mask = aff_mask;
+	}
+
+out:
 	return 0;
 
 err_out:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-16 14:54   ` [PATCH v2 " Pawel Chmielewski
@ 2023-02-16 15:14     ` Andy Shevchenko
  2023-02-16 15:16       ` Andy Shevchenko
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Shevchenko @ 2023-02-16 15:14 UTC (permalink / raw)
  To: Pawel Chmielewski
  Cc: Jonathan.Cameron, baohua, bristot, bsegall, davem,
	dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid, yury.norov

On Thu, Feb 16, 2023 at 03:54:55PM +0100, Pawel Chmielewski wrote:
> With the introduction of sched_numa_hop_mask() and for_each_numa_hop_mask(),
> the affinity masks for queue vectors can be conveniently set by preferring the
> CPUs that are closest to the NUMA node of the parent PCI device.

...

> +	v_idx = 0;

> +

Redundant blank line.

> +	for_each_numa_hop_mask(aff_mask, numa_node) {
> +		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
> +			if (v_idx >= vsi->num_q_vectors)

> +				goto out;

Useless. You can return 0; here.

> +			if (cpu_online(cpu)) {
> +				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
> +				v_idx++;
> +			}
> +		}
> +
> +		last_aff_mask = aff_mask;
> +	}
> +
> +out:
>  	return 0;

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/1] ice: Change assigning method of the CPU affinity masks
  2023-02-16 15:14     ` Andy Shevchenko
@ 2023-02-16 15:16       ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2023-02-16 15:16 UTC (permalink / raw)
  To: Pawel Chmielewski
  Cc: Jonathan.Cameron, baohua, bristot, bsegall, davem,
	dietmar.eggemann, gal, gregkh, hca, jacob.e.keller,
	jesse.brandeburg, jgg, juri.lelli, kuba, leonro, linux-crypto,
	linux-kernel, linux-rdma, linux, mgorman, mingo, netdev, peter,
	peterz, rostedt, saeedm, tariqt, tony.luck, torvalds,
	ttoukan.linux, vincent.guittot, vschneid, yury.norov

On Thu, Feb 16, 2023 at 05:14:33PM +0200, Andy Shevchenko wrote:
> On Thu, Feb 16, 2023 at 03:54:55PM +0100, Pawel Chmielewski wrote:

...

> > +		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
> > +			if (v_idx >= vsi->num_q_vectors)
> 
> > +				goto out;
> 
> Useless. You can return 0; here.
> 
> > +		}

Btw, I briefly checked the other functions nearby in the code and none of them
is using goto for OK cases.


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
  2023-02-03  0:58   ` Chen Yu
  2023-02-07  5:09   ` Jakub Kicinski
@ 2023-02-17  1:39   ` Yury Norov
  2023-02-17 11:11     ` Andy Shevchenko
  2023-02-20 19:46     ` Jakub Kicinski
  2 siblings, 2 replies; 36+ messages in thread
From: Yury Norov @ 2023-02-17  1:39 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Bruno Goncalves, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jakub Kicinski, Jason Gunthorpe,
	Jesse Brandeburg, Jonathan Cameron, Juri Lelli, Kees Cook,
	Leon Romanovsky, Linus Torvalds, Mel Gorman, Peter Lafreniere,
	Peter Zijlstra, Rasmus Villemoes, Saeed Mahameed, Steven Rostedt,
	Tariq Toukan, Tariq Toukan, Tony Luck, Valentin Schneider,
	Vincent Guittot
  Cc: linux-crypto, netdev, linux-rdma

Hi Jakub,

Can you please fold-in the following patch? 

Thanks,
Yury

From: Yury Norov <yury.norov@gmail.com>
Date: Thu, 16 Feb 2023 17:03:30 -0800
Subject: [PATCH] sched/topology: fix KASAN warning in hop_cmp()

Despite that prev_hop is used conditionally on curr_hop is not the
first hop, it's initialized unconditionally.

Because initialization implies dereferencing, it might happen that
the code dereferences uninitialized memory, which has been spotted by
KASAN. Fix it by reorganizing hop_cmp() logic.

Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/sched/topology.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 48838a05c008..c21b8b1f3537 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2081,14 +2081,19 @@ struct __cmp_key {
 
 static int hop_cmp(const void *a, const void *b)
 {
-	struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
-	struct cpumask **cur_hop = *(struct cpumask ***)b;
+	struct cpumask **prev_hop, **cur_hop = *(struct cpumask ***)b;
 	struct __cmp_key *k = (struct __cmp_key *)a;
 
 	if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
 		return 1;
 
-	k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
+	if (b == k->masks) {
+		k->w = 0;
+		return 0;
+	}
+
+	prev_hop = *((struct cpumask ***)b - 1);
+	k->w = cpumask_weight_and(k->cpus, prev_hop[k->node]);
 	if (k->w <= k->cpu)
 		return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-02-17  1:39   ` Yury Norov
@ 2023-02-17 11:11     ` Andy Shevchenko
  2023-02-20 19:46     ` Jakub Kicinski
  1 sibling, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2023-02-17 11:11 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, Barry Song, Ben Segall,
	Bruno Goncalves, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jakub Kicinski, Jason Gunthorpe,
	Jesse Brandeburg, Jonathan Cameron, Juri Lelli, Kees Cook,
	Leon Romanovsky, Linus Torvalds, Mel Gorman, Peter Lafreniere,
	Peter Zijlstra, Rasmus Villemoes, Saeed Mahameed, Steven Rostedt,
	Tariq Toukan, Tariq Toukan, Tony Luck, Valentin Schneider,
	Vincent Guittot, linux-crypto, netdev, linux-rdma

On Thu, Feb 16, 2023 at 05:39:08PM -0800, Yury Norov wrote:

> From: Yury Norov <yury.norov@gmail.com>
> Date: Thu, 16 Feb 2023 17:03:30 -0800
> Subject: [PATCH] sched/topology: fix KASAN warning in hop_cmp()
> 
> Despite that prev_hop is used conditionally on curr_hop is not the

curr --> cur

> first hop, it's initialized unconditionally.
> 
> Because initialization implies dereferencing, it might happen that
> the code dereferences uninitialized memory, which has been spotted by
> KASAN. Fix it by reorganizing hop_cmp() logic.

Nice catch! I guess it deserves for a comment inside the code
(IIRC I was puzzled of the logic behind and it was changed due
 to lack of this knowledge.)


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/9] sched: add sched_numa_find_nth_cpu()
  2023-02-17  1:39   ` Yury Norov
  2023-02-17 11:11     ` Andy Shevchenko
@ 2023-02-20 19:46     ` Jakub Kicinski
  1 sibling, 0 replies; 36+ messages in thread
From: Jakub Kicinski @ 2023-02-20 19:46 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, Andy Shevchenko, Barry Song,
	Ben Segall, Bruno Goncalves, Dietmar Eggemann, Gal Pressman,
	Greg Kroah-Hartman, Haniel Bristot de Oliveira, Heiko Carstens,
	Ingo Molnar, Jacob Keller, Jason Gunthorpe, Jesse Brandeburg,
	Jonathan Cameron, Juri Lelli, Kees Cook, Leon Romanovsky,
	Linus Torvalds, Mel Gorman, Peter Lafreniere, Peter Zijlstra,
	Rasmus Villemoes, Saeed Mahameed, Steven Rostedt, Tariq Toukan,
	Tariq Toukan, Tony Luck, Valentin Schneider, Vincent Guittot,
	linux-crypto, netdev, linux-rdma

On Thu, 16 Feb 2023 17:39:08 -0800 Yury Norov wrote:
> Despite that prev_hop is used conditionally on curr_hop is not the
> first hop, it's initialized unconditionally.
> 
> Because initialization implies dereferencing, it might happen that
> the code dereferences uninitialized memory, which has been spotted by
> KASAN. Fix it by reorganizing hop_cmp() logic.
> 
> Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Fixed the spelling pointed out by Andy and applied, thanks!

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-02-20 19:47 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-21  4:24 [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Yury Norov
2023-01-21  4:24 ` [PATCH 1/9] lib/find: introduce find_nth_and_andnot_bit Yury Norov
2023-01-21  4:24 ` [PATCH 2/9] cpumask: introduce cpumask_nth_and_andnot Yury Norov
2023-01-21  4:24 ` [PATCH 3/9] sched: add sched_numa_find_nth_cpu() Yury Norov
2023-02-03  0:58   ` Chen Yu
2023-02-07  5:09   ` Jakub Kicinski
2023-02-07 10:29     ` Valentin Schneider
2023-02-17  1:39   ` Yury Norov
2023-02-17 11:11     ` Andy Shevchenko
2023-02-20 19:46     ` Jakub Kicinski
2023-01-21  4:24 ` [PATCH 4/9] cpumask: improve on cpumask_local_spread() locality Yury Norov
2023-01-21  4:24 ` [PATCH 5/9] lib/cpumask: reorganize cpumask_local_spread() logic Yury Norov
2023-01-21  4:24 ` [PATCH 6/9] sched/topology: Introduce sched_numa_hop_mask() Yury Norov
2023-01-21  4:24 ` [PATCH 7/9] sched/topology: Introduce for_each_numa_hop_mask() Yury Norov
2023-01-21  4:24 ` [PATCH 8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Yury Norov
2023-01-21  4:24 ` [PATCH 9/9] lib/cpumask: update comment for cpumask_local_spread() Yury Norov
2023-01-22 12:57 ` [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Tariq Toukan
2023-01-23  9:57   ` Valentin Schneider
2023-01-29  8:07     ` Tariq Toukan
2023-01-30 20:22       ` Jakub Kicinski
2023-02-02 17:33         ` Jakub Kicinski
2023-02-02 17:37           ` Yury Norov
2023-02-08  2:25     ` Jakub Kicinski
2023-02-08  4:20 ` patchwork-bot+netdevbpf
2023-02-08 15:39 ` [PATCH 1/1] ice: Change assigning method of the CPU affinity masks Pawel Chmielewski
     [not found]   ` <CAH-L+nO+KyzPSX_F0fh+9i=0rW1hoBPFTGbXc1EX+4MGYOR1kA@mail.gmail.com>
2023-02-08 16:08     ` Andy Shevchenko
2023-02-08 16:39   ` Yury Norov
2023-02-08 16:58     ` Andy Shevchenko
2023-02-08 19:11   ` kernel test robot
2023-02-09  2:41     ` Philip Li
2023-02-08 19:22   ` kernel test robot
2023-02-08 23:21   ` Jakub Kicinski
2023-02-09  5:14   ` kernel test robot
2023-02-16 14:54   ` [PATCH v2 " Pawel Chmielewski
2023-02-16 15:14     ` Andy Shevchenko
2023-02-16 15:16       ` Andy Shevchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.