All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-20  5:19 Yury Norov
  2023-04-20  5:19 ` [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
                   ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2:
 - repase on top of master;
 - cleanup comments and tweak them to comply with kernel-doc;
 - remove RFC from patch #8 as there's no objections.

Yury Norov (8):
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched/topology: export sched_domains_numa_levels
  lib: add test for for_each_numa_{cpu,hop_mask}()
  sched: drop for_each_numa_hop_mask()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++----
 include/linux/find.h                         | 43 ++++++++++++++
 include/linux/topology.h                     | 37 ++++++------
 kernel/sched/topology.c                      | 59 +++++++++++---------
 lib/cpumask.c                                |  7 +--
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 16 ++++++
 7 files changed, 133 insertions(+), 57 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit()
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-20  5:19 ` [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Similarly to find_nth_and_andnot_bit(), find_next_and_andnot_bit() is
a convenient helper that allows traversing bitmaps without storing
intermediate results in a temporary bitmap.

In the following patches the function is used to implement NUMA-aware
CPUs enumeration.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       | 12 ++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..90b68d76c073 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -16,6 +16,9 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 					unsigned long nbits, unsigned long start);
 unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start);
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
+					const unsigned long *addr3, unsigned long nbits,
+					unsigned long start);
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
@@ -159,6 +162,40 @@ unsigned long find_next_or_bit(const unsigned long *addr1,
 }
 #endif
 
+#ifndef find_next_and_andnot_bit
+/**
+ * find_next_and_andnot_bit - find the next bit set in *addr1 and *addr2,
+ *			      excluding all the bits in *addr3
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @addr3: The third address to base the search on
+ * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Return: the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static __always_inline
+unsigned long find_next_and_andnot_bit(const unsigned long *addr1,
+				   const unsigned long *addr2,
+				   const unsigned long *addr3,
+				   unsigned long size,
+				   unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & ~*addr3 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_next_and_andnot_bit(addr1, addr2, addr3, size, offset);
+}
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
@@ -568,6 +605,12 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_andnot_bit(bit, addr1, addr2, addr3, size) \
+	for ((bit) = 0;									\
+	     (bit) = find_next_and_andnot_bit((addr1), (addr2), (addr3), (size), (bit)),\
+	     (bit) < (size);								\
+	     (bit)++)
+
 #define for_each_or_bit(bit, addr1, addr2, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_or_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..4403e00890b1 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -182,6 +182,18 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 EXPORT_SYMBOL(_find_next_andnot_bit);
 #endif
 
+#ifndef find_next_and_andnot_bit
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1,
+					const unsigned long *addr2,
+					const unsigned long *addr3,
+					unsigned long nbits,
+					unsigned long start)
+{
+	return FIND_NEXT_BIT(addr1[idx] & addr2[idx] & ~addr3[idx], /* nop */, nbits, start);
+}
+EXPORT_SYMBOL(_find_next_and_andnot_bit);
+#endif
+
 #ifndef find_next_or_bit
 unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-04-20  5:19 ` [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-25  9:54   ` Valentin Schneider
  2023-04-20  5:19 ` [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The function searches for the next CPU in a given cpumask according to
NUMA topology, so that it traverses cpus per-hop.

If the CPU is the last cpu in a given hop, sched_numa_find_next_cpu()
switches to the next hop, and picks the first CPU from there, excluding
those already traversed.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h |  7 +++++++
 kernel/sched/topology.c  | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index fea32377f7c7..13209095d6e2 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -247,6 +247,7 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
 extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
@@ -254,6 +255,12 @@ static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, i
 	return cpumask_nth(cpu, cpus);
 }
 
+static __always_inline
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	return find_next_bit(cpumask_bits(cpus), small_cpumask_bits, cpu);
+}
+
 static inline const struct cpumask *
 sched_numa_hop_mask(unsigned int node, unsigned int hops)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 051aaf65c749..fc163e4181e6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2130,6 +2130,45 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu);
 
+/*
+ * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
+ * cpumask: cpumask to find a cpu from
+ * cpu: current cpu
+ * node: local node
+ * hop: (in/out) indicates distance order of current CPU to a local node
+ *
+ * The function searches for next cpu at a given NUMA distance, indicated
+ * by hop, and if nothing found, tries to find CPUs at a greater distance,
+ * starting from the beginning.
+ *
+ * Return: cpu, or >= nr_cpu_ids when nothing found.
+ */
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	unsigned long *cur, *prev;
+	struct cpumask ***masks;
+	unsigned int ret;
+
+	if (*hop >= sched_domains_numa_levels)
+		return nr_cpu_ids;
+
+	masks = rcu_dereference(sched_domains_numa_masks);
+	cur = cpumask_bits(masks[*hop][node]);
+	if (*hop == 0)
+		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
+	else {
+		prev = cpumask_bits(masks[*hop - 1][node]);
+		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
+	}
+
+	if (ret < nr_cpu_ids)
+		return ret;
+
+	*hop += 1;
+	return sched_numa_find_next_cpu(cpus, 0, node, hop);
+}
+EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
+
 /**
  * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
  *                         @node
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-04-20  5:19 ` [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
  2023-04-20  5:19 ` [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-25  9:54   ` Valentin Schneider
  2023-04-20  5:19 ` [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in the kernel, and it's beneficial to
create a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to using it is not an easy process.

New for_each_numa_cpu() is designed to be similar to the for_each_cpu().
It allows to convert existing code to NUMA-aware as simple as adding a
hop iterator variable and passing it inside new macro. for_each_numa_cpu()
takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

	for_each_numa_hop_mask(cpus, prev, node) {
 		for_each_cpu_andnot(cpu, cpus, prev) {
 			if (cnt++ == max_num)
 				goto out;
 			do_something(cpu);
 		}
		prev = cpus;
 	}

With the new macro, it would look like this:

	for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

	cpumask_t cpus = get_mask();
	int cnt = 0, cpu;

	for_each_cpu(cpu, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Converting it to NUMA-aware version would be as simple as:

	cpumask_t cpus = get_mask();
	int node = get_node();
	int cnt = 0, hop, cpu;

	for_each_numa_cpu(cpu, hop, node, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 13209095d6e2..01fb3a55d7ce 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -286,4 +286,20 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
 	     !IS_ERR_OR_NULL(mask);					       \
 	     __hops++)
 
+/**
+ * for_each_numa_cpu - iterate over cpus in increasing order taking into account
+ *		       NUMA distances from a given node.
+ * @cpu: the (optionally unsigned) integer iterator
+ * @hop: the iterator variable, must be initialized to a desired minimal hop.
+ * @node: the NUMA node to start the search from.
+ * @mask: the cpumask pointer
+ *
+ * Requires rcu_lock to be held.
+ */
+#define for_each_numa_cpu(cpu, hop, node, mask)					\
+	for ((cpu) = 0, (hop) = 0;						\
+		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
+		(cpu) < nr_cpu_ids;						\
+		(cpu)++)
+
 #endif /* _LINUX_TOPOLOGY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (2 preceding siblings ...)
  2023-04-20  5:19 ` [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-20  8:27   ` Tariq Toukan
  2023-04-20  5:19 ` [PATCH v2 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_numa_cpu() is a more straightforward alternative to
for_each_numa_hop_mask() + for_each_cpu_andnot().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 38b32e98f3bd..80368952e9b1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -817,12 +817,10 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
 static int comp_irqs_request(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = dev->priv.eq_table;
-	const struct cpumask *prev = cpu_none_mask;
-	const struct cpumask *mask;
 	int ncomp_eqs = table->num_comp_eqs;
 	u16 *cpus;
 	int ret;
-	int cpu;
+	int cpu, hop;
 	int i;
 
 	ncomp_eqs = table->num_comp_eqs;
@@ -844,15 +842,11 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
 
 	i = 0;
 	rcu_read_lock();
-	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
-		for_each_cpu_andnot(cpu, mask, prev) {
-			cpus[i] = cpu;
-			if (++i == ncomp_eqs)
-				goto spread_done;
-		}
-		prev = mask;
+	for_each_numa_cpu(cpu, hop, dev->priv.numa_node, cpu_possible_mask) {
+		cpus[i] = cpu;
+		if (++i == ncomp_eqs)
+			break;
 	}
-spread_done:
 	rcu_read_unlock();
 	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
 	kfree(cpus);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/8] lib/cpumask: update comment to cpumask_local_spread()
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (3 preceding siblings ...)
  2023-04-20  5:19 ` [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-20  5:19 ` [PATCH v2 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have a for_each_numa_cpu(), which is a more straightforward
replacement to the cpumask_local_spread() when it comes to enumeration
of CPUs with respect to NUMA topology, it's worth to update the comment.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/cpumask.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index e7258836b60b..151d1dc5c593 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -127,11 +127,8 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
  *
  * There's a better alternative based on for_each()-like iterators:
  *
- *	for_each_numa_hop_mask(mask, node) {
- *		for_each_cpu_andnot(cpu, mask, prev)
- *			do_something(cpu);
- *		prev = mask;
- *	}
+ *	for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+ *		do_something(cpu);
  *
  * It's simpler and more verbose than above. Complexity of iterator-based
  * enumeration is O(sched_domains_numa_levels * nr_cpu_ids), while
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 6/8] sched/topology: export sched_domains_numa_levels
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (4 preceding siblings ...)
  2023-04-20  5:19 ` [PATCH v2 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-20  5:19 ` [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
  2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
  7 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The following patch adds a test for NUMA-aware CPU enumerators, and it
requires an access to sched_domains_numa_levels.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h |  7 +++++++
 kernel/sched/topology.c  | 10 ++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 01fb3a55d7ce..7ebcc886dc76 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,13 @@
 	for_each_online_node(node)			\
 		if (nr_cpus_node(node))
 
+#ifdef CONFIG_NUMA
+extern int __sched_domains_numa_levels;
+#define sched_domains_numa_levels ((const int)__sched_domains_numa_levels)
+#else
+#define sched_domains_numa_levels (1)
+#endif
+
 int arch_update_cpu_topology(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index fc163e4181e6..56daa279c411 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1508,7 +1508,9 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 #ifdef CONFIG_NUMA
 enum numa_topology_type sched_numa_topology_type;
 
-static int			sched_domains_numa_levels;
+int				__sched_domains_numa_levels;
+EXPORT_SYMBOL_GPL(__sched_domains_numa_levels);
+
 static int			sched_domains_curr_level;
 
 int				sched_max_numa_distance;
@@ -1872,7 +1874,7 @@ void sched_init_numa(int offline_node)
 	 *
 	 * We reset it to 'nr_levels' at the end of this function.
 	 */
-	sched_domains_numa_levels = 0;
+	__sched_domains_numa_levels = 0;
 
 	masks = kzalloc(sizeof(void *) * nr_levels, GFP_KERNEL);
 	if (!masks)
@@ -1948,7 +1950,7 @@ void sched_init_numa(int offline_node)
 	sched_domain_topology_saved = sched_domain_topology;
 	sched_domain_topology = tl;
 
-	sched_domains_numa_levels = nr_levels;
+	__sched_domains_numa_levels = nr_levels;
 	WRITE_ONCE(sched_max_numa_distance, sched_domains_numa_distance[nr_levels - 1]);
 
 	init_numa_topology_type(offline_node);
@@ -1961,7 +1963,7 @@ static void sched_reset_numa(void)
 	struct cpumask ***masks;
 
 	nr_levels = sched_domains_numa_levels;
-	sched_domains_numa_levels = 0;
+	__sched_domains_numa_levels = 0;
 	sched_max_numa_distance = 0;
 	sched_numa_topology_type = NUMA_DIRECT;
 	distances = sched_domains_numa_distance;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (5 preceding siblings ...)
  2023-04-20  5:19 ` [PATCH v2 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-24 17:09   ` Valentin Schneider
  2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
  7 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The test ensures that enumerators' output is consistent with
cpumask_local_spread().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index a8005ad3bd58..1b5f805f6879 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -12,6 +12,7 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/topology.h>
 #include <linux/uaccess.h>
 
 #include "../tools/testing/selftests/kselftest_module.h"
@@ -751,6 +752,33 @@ static void __init test_for_each_set_bit_wrap(void)
 	}
 }
 
+static void __init test_for_each_numa(void)
+{
+	unsigned int cpu, node;
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		const struct cpumask *m, *p = cpu_none_mask;
+		unsigned int c = 0;
+
+		rcu_read_lock();
+		for_each_numa_hop_mask(m, node) {
+			for_each_cpu_andnot(cpu, m, p)
+				expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+			p = m;
+		}
+		rcu_read_unlock();
+	}
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		unsigned int hop, c = 0;
+
+		rcu_read_lock();
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+		rcu_read_unlock();
+	}
+}
+
 static void __init test_for_each_set_bit(void)
 {
 	DECLARE_BITMAP(orig, 500);
@@ -1237,6 +1265,7 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+	test_for_each_numa();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
  2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (6 preceding siblings ...)
  2023-04-20  5:19 ` [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
@ 2023-04-20  5:19 ` Yury Norov
  2023-04-20 10:15   ` kernel test robot
                     ` (2 more replies)
  7 siblings, 3 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20  5:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have for_each_numa_cpu(), for_each_numa_hop_mask()
and all related code is a dead code. Drop it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 25 -------------------------
 kernel/sched/topology.c  | 32 --------------------------------
 lib/test_bitmap.c        | 13 -------------
 3 files changed, 70 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 7ebcc886dc76..1225ade33053 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -255,7 +255,6 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
 int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
-extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 {
@@ -267,32 +266,8 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 {
 	return find_next_bit(cpumask_bits(cpus), small_cpumask_bits, cpu);
 }
-
-static inline const struct cpumask *
-sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	return ERR_PTR(-EOPNOTSUPP);
-}
 #endif	/* CONFIG_NUMA */
 
-/**
- * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance
- *                          from a given node.
- * @mask: the iteration variable.
- * @node: the NUMA node to start the search from.
- *
- * Requires rcu_lock to be held.
- *
- * Yields cpu_online_mask for @node == NUMA_NO_NODE.
- */
-#define for_each_numa_hop_mask(mask, node)				       \
-	for (unsigned int __hops = 0;					       \
-	     mask = (node != NUMA_NO_NODE || __hops) ?			       \
-		     sched_numa_hop_mask(node, __hops) :		       \
-		     cpu_online_mask,					       \
-	     !IS_ERR_OR_NULL(mask);					       \
-	     __hops++)
-
 /**
  * for_each_numa_cpu - iterate over cpus in increasing order taking into account
  *		       NUMA distances from a given node.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 56daa279c411..9d08ffdbd2d8 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2171,38 +2171,6 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
 
-/**
- * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
- *                         @node
- * @node: The node to count hops from.
- * @hops: Include CPUs up to that many hops away. 0 means local node.
- *
- * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
- * @node, an error value otherwise.
- *
- * Requires rcu_lock to be held. Returned cpumask is only valid within that
- * read-side section, copy it if required beyond that.
- *
- * Note that not all hops are equal in distance; see sched_init_numa() for how
- * distances and masks are handled.
- * Also note that this is a reflection of sched_domains_numa_masks, which may change
- * during the lifetime of the system (offline nodes are taken out of the masks).
- */
-const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	struct cpumask ***masks;
-
-	if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
-		return ERR_PTR(-EINVAL);
-
-	masks = rcu_dereference(sched_domains_numa_masks);
-	if (!masks)
-		return ERR_PTR(-EBUSY);
-
-	return masks[hops][node];
-}
-EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
-
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 1b5f805f6879..6becb044a66f 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -756,19 +756,6 @@ static void __init test_for_each_numa(void)
 {
 	unsigned int cpu, node;
 
-	for (node = 0; node < sched_domains_numa_levels; node++) {
-		const struct cpumask *m, *p = cpu_none_mask;
-		unsigned int c = 0;
-
-		rcu_read_lock();
-		for_each_numa_hop_mask(m, node) {
-			for_each_cpu_andnot(cpu, m, p)
-				expect_eq_uint(cpumask_local_spread(c++, node), cpu);
-			p = m;
-		}
-		rcu_read_unlock();
-	}
-
 	for (node = 0; node < sched_domains_numa_levels; node++) {
 		unsigned int hop, c = 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  2023-04-20  5:19 ` [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
@ 2023-04-20  8:27   ` Tariq Toukan
  2023-04-20 22:45     ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Tariq Toukan @ 2023-04-20  8:27 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song



On 20/04/2023 8:19, Yury Norov wrote:
> for_each_numa_cpu() is a more straightforward alternative to
> for_each_numa_hop_mask() + for_each_cpu_andnot().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 +++++-----------
>   1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 38b32e98f3bd..80368952e9b1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -817,12 +817,10 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>   {
>   	struct mlx5_eq_table *table = dev->priv.eq_table;
> -	const struct cpumask *prev = cpu_none_mask;
> -	const struct cpumask *mask;
>   	int ncomp_eqs = table->num_comp_eqs;
>   	u16 *cpus;
>   	int ret;
> -	int cpu;
> +	int cpu, hop;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -844,15 +842,11 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   
>   	i = 0;
>   	rcu_read_lock();
> -	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> -		for_each_cpu_andnot(cpu, mask, prev) {
> -			cpus[i] = cpu;
> -			if (++i == ncomp_eqs)
> -				goto spread_done;
> -		}
> -		prev = mask;
> +	for_each_numa_cpu(cpu, hop, dev->priv.numa_node, cpu_possible_mask) {

I like this clean API.

nit:
Previously cpu_online_mask was used here. Is this change intentional?
We can fix it in a followup patch if this is the only comment on the series.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

> +		cpus[i] = cpu;
> +		if (++i == ncomp_eqs)
> +			break;
>   	}
> -spread_done:
>   	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
  2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
@ 2023-04-20 10:15   ` kernel test robot
  2023-04-21  8:38   ` kernel test robot
  2023-04-21  8:38   ` kernel test robot
  2 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-04-20 10:15 UTC (permalink / raw)
  To: Yury Norov; +Cc: oe-kbuild-all

Hi Yury,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]
[also build test WARNING on linus/master v6.3-rc7]
[cannot apply to tip/sched/core horms-ipvs/master net-next/main next-20230419]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
patch link:    https://lore.kernel.org/r/20230420051946.7463-9-yury.norov%40gmail.com
patch subject: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230420/202304201832.dPYSOjPk-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/2425882bb68586aad08582c2efbb860a78e1b8e3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
        git checkout 2425882bb68586aad08582c2efbb860a78e1b8e3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 olddefconfig
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304201832.dPYSOjPk-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c: In function 'ice_vsi_alloc_q_vectors':
   drivers/net/ethernet/intel/ice/ice_base.c:681:9: error: implicit declaration of function 'for_each_numa_hop_mask'; did you mean 'for_each_numa_cpu'? [-Werror=implicit-function-declaration]
     681 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |         ^~~~~~~~~~~~~~~~~~~~~~
         |         for_each_numa_cpu
   drivers/net/ethernet/intel/ice/ice_base.c:681:52: error: expected ';' before '{' token
     681 |         for_each_numa_hop_mask(aff_mask, numa_node) {
         |                                                    ^~
         |                                                    ;
>> drivers/net/ethernet/intel/ice/ice_base.c:665:20: warning: unused variable 'cpu' [-Wunused-variable]
     665 |         u16 v_idx, cpu = 0;
         |                    ^~~
>> drivers/net/ethernet/intel/ice/ice_base.c:662:31: warning: unused variable 'last_aff_mask' [-Wunused-variable]
     662 |         cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
         |                               ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/cpu +665 drivers/net/ethernet/intel/ice/ice_base.c

eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  652  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  653  /**
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  654   * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  655   * @vsi: the VSI being configured
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  656   *
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  657   * We allocate one q_vector per queue interrupt. If allocation fails we
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  658   * return -ENOMEM.
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  659   */
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  660  int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  661  {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16 @662  	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  663  	struct device *dev = ice_pf_to_dev(vsi->back);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  664  	int numa_node = dev->numa_node;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16 @665  	u16 v_idx, cpu = 0;
88865fc4bbd61e Karol Kolacinski       2020-05-07  666  	int err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  667  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  668  	if (vsi->q_vectors[0]) {
4015d11e4b9720 Brett Creeley          2019-11-08  669  		dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  670  		return -EEXIST;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  671  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  672  
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  673  	for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  674  		err = ice_vsi_alloc_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  675  		if (err)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  676  			goto err_out;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  677  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  678  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  679  	v_idx = 0;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  680  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  681  	for_each_numa_hop_mask(aff_mask, numa_node) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  682  		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  683  			if (v_idx >= vsi->num_q_vectors)
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  684  				goto out;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  685  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  686  			if (cpu_online(cpu)) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  687  				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  688  				v_idx++;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  689  			}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  690  		}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  691  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  692  		last_aff_mask = aff_mask;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  693  	}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  694  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  695  out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  696  	return 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  697  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  698  err_out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  699  	while (v_idx--)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  700  		ice_free_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  701  
4015d11e4b9720 Brett Creeley          2019-11-08  702  	dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  703  		vsi->num_q_vectors, vsi->vsi_num, err);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  704  	vsi->num_q_vectors = 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  705  	return err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  706  }
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  707  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  2023-04-20  8:27   ` Tariq Toukan
@ 2023-04-20 22:45     ` Yury Norov
  0 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-20 22:45 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Thu, Apr 20, 2023 at 11:27:26AM +0300, Tariq Toukan wrote:
> I like this clean API.

Thanks :)
 
> nit:
> Previously cpu_online_mask was used here. Is this change intentional?
> We can fix it in a followup patch if this is the only comment on the series.
> 
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

The only CPUs listed in the sched_domains_numa_masks are 'available',
i.e. online CPUs. The for_each_numa_cpu() ANDs user-provided cpumask
with a map associate to the hop, and that means that if we AND with
possible mask, we'll eventually walk online CPUs only.

To make sure, I experimented with the modified test:

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6becb044a66f..c8d557731080 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -760,8 +760,13 @@ static void __init test_for_each_numa(void)
                unsigned int hop, c = 0;

                rcu_read_lock();
-               for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+               pr_err("Node %d:\t", node);
+               for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                        expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+                       pr_cont("%3d", cpu);
+
+               }
+               pr_err("\n");
                rcu_read_unlock();
        }
 }

This is the NUMA topology of my test machine after the boot:

    root@debian:~# numactl -H
    available: 4 nodes (0-3)
    node 0 cpus: 0 1 2 3
    node 0 size: 1861 MB
    node 0 free: 1792 MB
    node 1 cpus: 4 5
    node 1 size: 1914 MB
    node 1 free: 1823 MB
    node 2 cpus: 6 7
    node 2 size: 1967 MB
    node 2 free: 1915 MB
    node 3 cpus: 8 9 10 11 12 13 14 15
    node 3 size: 7862 MB
    node 3 free: 7259 MB
    node distances:
    node   0   1   2   3
      0:  10  50  30  70
      1:  50  10  70  30
      2:  30  70  10  50
      3:  70  30  50  10

And this is what test prints:

     root@debian:~# insmod test_bitmap.ko
     test_bitmap: loaded.
     test_bitmap: parselist: 14: input is '0-2047:128/256' OK, Time: 472
     test_bitmap: bitmap_print_to_pagebuf: input is '0-32767
     ', Time: 2665
     test_bitmap: Node 0:	  0  1  2  3  6  7  4  5  8  9 10 11 12 13 14 15
     test_bitmap:
     test_bitmap: Node 1:	  4  5  8  9 10 11 12 13 14 15  0  1  2  3  6  7
     test_bitmap:
     test_bitmap: Node 2:	  6  7  0  1  2  3  8  9 10 11 12 13 14 15  4  5
     test_bitmap:
     test_bitmap: Node 3:	  8  9 10 11 12 13 14 15  4  5  6  7  0  1  2  3
     test_bitmap:
     test_bitmap: all 6614 tests passed

Now, disable a couple of CPUs:

     root@debian:~# chcpu -d 1-2
     smpboot: CPU 1 is now offline
     CPU 1 disabled
     smpboot: CPU 2 is now offline
     CPU 2 disabled

And try again:

     root@debian:~# rmmod test_bitmap
     rmmod: ERROR: ../libkmod/libkmod[  320.275904] test_bitmap: unloaded.
     root@debian:~# numactl -H
     available: 4 nodes (0-3)
     node 0 cpus: 0 3
     node 0 size: 1861 MB
     node 0 free: 1792 MB
     node 1 cpus: 4 5
     node 1 size: 1914 MB
     node 1 free: 1823 MB
     node 2 cpus: 6 7
     node 2 size: 1967 MB
     node 2 free: 1915 MB
     node 3 cpus: 8 9 10 11 12 13 14 15
     node 3 size: 7862 MB
     node 3 free: 7259 MB
     node distances:
     node   0   1   2   3
       0:  10  50  30  70
       1:  50  10  70  30
       2:  30  70  10  50
       3:  70  30  50  10
     root@debian:~# insmod test_bitmap.ko
     test_bitmap: loaded.
     test_bitmap: parselist: 14: input is '0-2047:128/256' OK, Time: 491
     test_bitmap: bitmap_print_to_pagebuf: input is '0-32767
     ', Time: 2174
     test_bitmap: Node 0:	  0  3  6  7  4  5  8  9 10 11 12 13 14 15
     test_bitmap:
     test_bitmap: Node 1:	  4  5  8  9 10 11 12 13 14 15  0  3  6  7
     test_bitmap:
     test_bitmap: Node 2:	  6  7  0  3  8  9 10 11 12 13 14 15  4  5
     test_bitmap:
     test_bitmap: Node 3:	  8  9 10 11 12 13 14 15  4  5  6  7  0  3
     test_bitmap:
     test_bitmap: all 6606 tests passed

I used cpu_possible_mask because I wanted to keep the patch
consistent: before we traversed NUMA hop masks, now we traverse the
same hop masks AND user-provided mask, so the latter should include
all possible CPUs.

If you think it's better to have cpu_online_mask in the driver, let's
make it in a separate patch?

Thanks,
Yury

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
  2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
  2023-04-20 10:15   ` kernel test robot
  2023-04-21  8:38   ` kernel test robot
@ 2023-04-21  8:38   ` kernel test robot
  2 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-04-21  8:38 UTC (permalink / raw)
  To: Yury Norov; +Cc: llvm, oe-kbuild-all

Hi Yury,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]
[also build test ERROR on linus/master v6.3-rc7]
[cannot apply to tip/sched/core horms-ipvs/master net-next/main next-20230420]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
patch link:    https://lore.kernel.org/r/20230420051946.7463-9-yury.norov%40gmail.com
patch subject: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20230421/202304211617.4nr99gar-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2425882bb68586aad08582c2efbb860a78e1b8e3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
        git checkout 2425882bb68586aad08582c2efbb860a78e1b8e3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/ethernet/intel/ice/ kernel/rcu/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304211617.4nr99gar-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c:664:23: error: no member named 'numa_node' in 'struct device'
           int numa_node = dev->numa_node;
                           ~~~  ^
>> drivers/net/ethernet/intel/ice/ice_base.c:681:2: error: implicit declaration of function 'for_each_numa_hop_mask' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
           for_each_numa_hop_mask(aff_mask, numa_node) {
           ^
>> drivers/net/ethernet/intel/ice/ice_base.c:681:45: error: expected ';' after expression
           for_each_numa_hop_mask(aff_mask, numa_node) {
                                                      ^
                                                      ;
   3 errors generated.


vim +/for_each_numa_hop_mask +681 drivers/net/ethernet/intel/ice/ice_base.c

eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  652  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  653  /**
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  654   * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  655   * @vsi: the VSI being configured
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  656   *
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  657   * We allocate one q_vector per queue interrupt. If allocation fails we
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  658   * return -ENOMEM.
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  659   */
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  660  int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  661  {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  662  	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  663  	struct device *dev = ice_pf_to_dev(vsi->back);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16 @664  	int numa_node = dev->numa_node;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  665  	u16 v_idx, cpu = 0;
88865fc4bbd61e Karol Kolacinski       2020-05-07  666  	int err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  667  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  668  	if (vsi->q_vectors[0]) {
4015d11e4b9720 Brett Creeley          2019-11-08  669  		dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  670  		return -EEXIST;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  671  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  672  
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  673  	for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  674  		err = ice_vsi_alloc_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  675  		if (err)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  676  			goto err_out;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  677  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  678  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  679  	v_idx = 0;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  680  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16 @681  	for_each_numa_hop_mask(aff_mask, numa_node) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  682  		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  683  			if (v_idx >= vsi->num_q_vectors)
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  684  				goto out;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  685  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  686  			if (cpu_online(cpu)) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  687  				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  688  				v_idx++;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  689  			}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  690  		}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  691  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  692  		last_aff_mask = aff_mask;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  693  	}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  694  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  695  out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  696  	return 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  697  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  698  err_out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  699  	while (v_idx--)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  700  		ice_free_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  701  
4015d11e4b9720 Brett Creeley          2019-11-08  702  	dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  703  		vsi->num_q_vectors, vsi->vsi_num, err);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  704  	vsi->num_q_vectors = 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  705  	return err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  706  }
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  707  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
  2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
  2023-04-20 10:15   ` kernel test robot
@ 2023-04-21  8:38   ` kernel test robot
  2023-04-21  8:38   ` kernel test robot
  2 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2023-04-21  8:38 UTC (permalink / raw)
  To: Yury Norov; +Cc: llvm, oe-kbuild-all

Hi Yury,

kernel test robot noticed the following build errors:

[auto build test ERROR on net/main]
[also build test ERROR on linus/master v6.3-rc7]
[cannot apply to tip/sched/core horms-ipvs/master net-next/main next-20230420]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
patch link:    https://lore.kernel.org/r/20230420051946.7463-9-yury.norov%40gmail.com
patch subject: [PATCH v2 8/8] sched: drop for_each_numa_hop_mask()
config: powerpc-randconfig-r016-20230417 (https://download.01.org/0day-ci/archive/20230421/202304211610.1Xprvad2-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 437b7602e4a998220871de78afcb020b9c14a661)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/2425882bb68586aad08582c2efbb860a78e1b8e3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yury-Norov/lib-find-add-find_next_and_andnot_bit/20230420-132247
        git checkout 2425882bb68586aad08582c2efbb860a78e1b8e3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/net/ethernet/intel/ice/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304211610.1Xprvad2-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c:664:23: error: no member named 'numa_node' in 'struct device'
           int numa_node = dev->numa_node;
                           ~~~  ^
>> drivers/net/ethernet/intel/ice/ice_base.c:681:2: error: call to undeclared function 'for_each_numa_hop_mask'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           for_each_numa_hop_mask(aff_mask, numa_node) {
           ^
   drivers/net/ethernet/intel/ice/ice_base.c:681:45: error: expected ';' after expression
           for_each_numa_hop_mask(aff_mask, numa_node) {
                                                      ^
                                                      ;
   3 errors generated.


vim +/for_each_numa_hop_mask +681 drivers/net/ethernet/intel/ice/ice_base.c

eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  652  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  653  /**
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  654   * ice_vsi_alloc_q_vectors - Allocate memory for interrupt vectors
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  655   * @vsi: the VSI being configured
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  656   *
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  657   * We allocate one q_vector per queue interrupt. If allocation fails we
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  658   * return -ENOMEM.
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  659   */
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  660  int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  661  {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  662  	cpumask_t *aff_mask, *last_aff_mask = cpu_none_mask;
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  663  	struct device *dev = ice_pf_to_dev(vsi->back);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  664  	int numa_node = dev->numa_node;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  665  	u16 v_idx, cpu = 0;
88865fc4bbd61e Karol Kolacinski       2020-05-07  666  	int err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  667  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  668  	if (vsi->q_vectors[0]) {
4015d11e4b9720 Brett Creeley          2019-11-08  669  		dev_dbg(dev, "VSI %d has existing q_vectors\n", vsi->vsi_num);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  670  		return -EEXIST;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  671  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  672  
3306f79f428e85 Anirudh Venkataramanan 2020-02-06  673  	for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  674  		err = ice_vsi_alloc_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  675  		if (err)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  676  			goto err_out;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  677  	}
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  678  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  679  	v_idx = 0;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  680  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16 @681  	for_each_numa_hop_mask(aff_mask, numa_node) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  682  		for_each_cpu_andnot(cpu, aff_mask, last_aff_mask) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  683  			if (v_idx >= vsi->num_q_vectors)
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  684  				goto out;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  685  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  686  			if (cpu_online(cpu)) {
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  687  				cpumask_set_cpu(cpu, &vsi->q_vectors[v_idx]->affinity_mask);
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  688  				v_idx++;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  689  			}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  690  		}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  691  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  692  		last_aff_mask = aff_mask;
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  693  	}
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  694  
5dfb0e8f9fff34 Pawel Chmielewski      2023-02-16  695  out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  696  	return 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  697  
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  698  err_out:
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  699  	while (v_idx--)
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  700  		ice_free_q_vector(vsi, v_idx);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  701  
4015d11e4b9720 Brett Creeley          2019-11-08  702  	dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  703  		vsi->num_q_vectors, vsi->vsi_num, err);
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  704  	vsi->num_q_vectors = 0;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  705  	return err;
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  706  }
eff380aaffedb2 Anirudh Venkataramanan 2019-10-24  707  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-20  5:19 ` [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
@ 2023-04-24 17:09   ` Valentin Schneider
  2023-04-26  5:50     ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2023-04-24 17:09 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 19/04/23 22:19, Yury Norov wrote:
> +	for (node = 0; node < sched_domains_numa_levels; node++) {
> +		unsigned int hop, c = 0;
> +
> +		rcu_read_lock();
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +		rcu_read_unlock();
> +	}

I'm not fond of the export of sched_domains_numa_levels, especially
considering it's just there for tests.

Furthermore, is there any value is testing parity with
cpumask_local_spread()? Rather, shouldn't we check that using this API does
yield CPUs of increasing NUMA distance?

Something like

        for_each_node(node) {
                unsigned int prev_cpu, hop = 0;

                cpu = cpumask_first(cpumask_of_node(node));
                prev_cpu = cpu;

                rcu_read_lock();

                /* Assert distance is monotonically increasing */
                for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
                        expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
                        prev_cpu = cpu;
                }

                rcu_read_unlock();
        }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-20  5:19 ` [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-04-25  9:54   ` Valentin Schneider
  2023-04-26  5:32     ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2023-04-25  9:54 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 19/04/23 22:19, Yury Norov wrote:
> +/**
> + * for_each_numa_cpu - iterate over cpus in increasing order taking into account
> + *		       NUMA distances from a given node.
> + * @cpu: the (optionally unsigned) integer iterator
> + * @hop: the iterator variable, must be initialized to a desired minimal hop.
> + * @node: the NUMA node to start the search from.
> + * @mask: the cpumask pointer
> + *
> + * Requires rcu_lock to be held.
> + */
> +#define for_each_numa_cpu(cpu, hop, node, mask)					\
> +	for ((cpu) = 0, (hop) = 0;						\
> +		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
> +		(cpu) < nr_cpu_ids;						\
> +		(cpu)++)
> +

I think we can keep sched_numa_find_next_cpu() as-is, but could we make
that macro use cpu_possible_mask by default? We can always add a variant
if/when we need to feed in a different mask.

>  #endif /* _LINUX_TOPOLOGY_H */
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-04-20  5:19 ` [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
@ 2023-04-25  9:54   ` Valentin Schneider
  2023-04-26  5:26     ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2023-04-25  9:54 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 19/04/23 22:19, Yury Norov wrote:
> +/*
> + * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
> + * cpumask: cpumask to find a cpu from
> + * cpu: current cpu
> + * node: local node
> + * hop: (in/out) indicates distance order of current CPU to a local node
> + *
> + * The function searches for next cpu at a given NUMA distance, indicated
> + * by hop, and if nothing found, tries to find CPUs at a greater distance,
> + * starting from the beginning.
> + *
> + * Return: cpu, or >= nr_cpu_ids when nothing found.
> + */
> +int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
> +{
> +	unsigned long *cur, *prev;
> +	struct cpumask ***masks;
> +	unsigned int ret;
> +
> +	if (*hop >= sched_domains_numa_levels)
> +		return nr_cpu_ids;
> +
> +	masks = rcu_dereference(sched_domains_numa_masks);
> +	cur = cpumask_bits(masks[*hop][node]);
> +	if (*hop == 0)
> +		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
> +	else {
> +		prev = cpumask_bits(masks[*hop - 1][node]);
> +		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
> +	}
> +
> +	if (ret < nr_cpu_ids)
> +		return ret;
> +
> +	*hop += 1;
> +	return sched_numa_find_next_cpu(cpus, 0, node, hop);

sched_domains_numa_levels is a fairly small number, so the recursion depth
isn't something we really need to worry about - still, the iterative
variant of this is fairly straightforward to get to:

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index e850f16c003ae..4c9a9e48fef6d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2151,23 +2151,27 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 	struct cpumask ***masks;
 	unsigned int ret;
 
-	if (*hop >= sched_domains_numa_levels)
-		return nr_cpu_ids;
+	/*
+	 * Reset @cpu to 0 when increasing @hop, since CPU numbering has no
+	 * relationship with NUMA distance: a search at @hop+1 may yield CPUs
+	 * of lower ID than previously seen!
+	 */
+	for (; *hop >= sched_domains_numa_levels; *hop += 1, cpu = 0) {
+		masks = rcu_dereference(sched_domains_numa_masks);
+		cur = cpumask_bits(masks[*hop][node]);
+
+		if (*hop == 0) {
+			ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
+		} else {
+			prev = cpumask_bits(masks[*hop - 1][node]);
+			ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
+		}
 
-	masks = rcu_dereference(sched_domains_numa_masks);
-	cur = cpumask_bits(masks[*hop][node]);
-	if (*hop == 0)
-		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
-	else {
-		prev = cpumask_bits(masks[*hop - 1][node]);
-		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
+		if (ret < nr_cpu_ids)
+			return ret;
 	}
 
-	if (ret < nr_cpu_ids)
-		return ret;
-
-	*hop += 1;
-	return sched_numa_find_next_cpu(cpus, 0, node, hop);
+	return nr_cpu_ids;
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-04-25  9:54   ` Valentin Schneider
@ 2023-04-26  5:26     ` Yury Norov
  2023-04-26  9:17       ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-26  5:26 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Tue, Apr 25, 2023 at 10:54:56AM +0100, Valentin Schneider wrote:
> On 19/04/23 22:19, Yury Norov wrote:
> > +/*
> > + * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
> > + * cpumask: cpumask to find a cpu from
> > + * cpu: current cpu
> > + * node: local node
> > + * hop: (in/out) indicates distance order of current CPU to a local node
> > + *
> > + * The function searches for next cpu at a given NUMA distance, indicated
> > + * by hop, and if nothing found, tries to find CPUs at a greater distance,
> > + * starting from the beginning.
> > + *
> > + * Return: cpu, or >= nr_cpu_ids when nothing found.
> > + */
> > +int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
> > +{
> > +	unsigned long *cur, *prev;
> > +	struct cpumask ***masks;
> > +	unsigned int ret;
> > +
> > +	if (*hop >= sched_domains_numa_levels)
> > +		return nr_cpu_ids;
> > +
> > +	masks = rcu_dereference(sched_domains_numa_masks);
> > +	cur = cpumask_bits(masks[*hop][node]);
> > +	if (*hop == 0)
> > +		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
> > +	else {
> > +		prev = cpumask_bits(masks[*hop - 1][node]);
> > +		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
> > +	}
> > +
> > +	if (ret < nr_cpu_ids)
> > +		return ret;
> > +
> > +	*hop += 1;
> > +	return sched_numa_find_next_cpu(cpus, 0, node, hop);
> 
> sched_domains_numa_levels is a fairly small number, so the recursion depth
> isn't something we really need to worry about - still, the iterative
> variant of this is fairly straightforward to get to:

This is a tail recursion. Compiler normally converts it into the loop just
as well. At least, my GCC does.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-25  9:54   ` Valentin Schneider
@ 2023-04-26  5:32     ` Yury Norov
  2023-04-26  9:17       ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-26  5:32 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Tue, Apr 25, 2023 at 10:54:48AM +0100, Valentin Schneider wrote:
> On 19/04/23 22:19, Yury Norov wrote:
> > +/**
> > + * for_each_numa_cpu - iterate over cpus in increasing order taking into account
> > + *		       NUMA distances from a given node.
> > + * @cpu: the (optionally unsigned) integer iterator
> > + * @hop: the iterator variable, must be initialized to a desired minimal hop.
> > + * @node: the NUMA node to start the search from.
> > + * @mask: the cpumask pointer
> > + *
> > + * Requires rcu_lock to be held.
> > + */
> > +#define for_each_numa_cpu(cpu, hop, node, mask)					\
> > +	for ((cpu) = 0, (hop) = 0;						\
> > +		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
> > +		(cpu) < nr_cpu_ids;						\
> > +		(cpu)++)
> > +
> 
> I think we can keep sched_numa_find_next_cpu() as-is, but could we make
> that macro use cpu_possible_mask by default? We can always add a variant
> if/when we need to feed in a different mask.

As mentioned in discussion to the driver's patch, all that numa things
imply only online CPUs, so cpu_possible_mask may mislead to some extent. 

Anyways, can you elaborate what you exactly want? Like this?

 #define for_each_numa_online_cpu(cpu, hop, node)       \
        for_each_numa_cpu(cpu, hop, node, cpu_online_mask)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-24 17:09   ` Valentin Schneider
@ 2023-04-26  5:50     ` Yury Norov
  2023-04-26  9:17       ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-26  5:50 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

Hi Valentin,

Thanks for review!

On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
> On 19/04/23 22:19, Yury Norov wrote:
> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
> > +		unsigned int hop, c = 0;
> > +
> > +		rcu_read_lock();
> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> > +		rcu_read_unlock();
> > +	}
> 
> I'm not fond of the export of sched_domains_numa_levels, especially
> considering it's just there for tests.
> 
> Furthermore, is there any value is testing parity with
> cpumask_local_spread()?

I wanted to emphasize that new NUMA-aware functions are coherent with
each other, just like find_nth_bit() is coherent with find_next_bit().

But all that coherence looks important only in non-NUMA case, because
client code may depend on fact that next CPU is never less than current.
This doesn't hold for NUMA iterators anyways...

> Rather, shouldn't we check that using this API does
> yield CPUs of increasing NUMA distance?
> 
> Something like
> 
>         for_each_node(node) {
>                 unsigned int prev_cpu, hop = 0;
> 
>                 cpu = cpumask_first(cpumask_of_node(node));
>                 prev_cpu = cpu;
> 
>                 rcu_read_lock();
> 
>                 /* Assert distance is monotonically increasing */
>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>                         prev_cpu = cpu;
>                 }
> 
>                 rcu_read_unlock();
>         }

Your version of the test looks more straightforward. I need to think
for more, but it looks like I can take it in v3.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-04-26  5:26     ` Yury Norov
@ 2023-04-26  9:17       ` Valentin Schneider
  0 siblings, 0 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-04-26  9:17 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 25/04/23 22:26, Yury Norov wrote:
> On Tue, Apr 25, 2023 at 10:54:56AM +0100, Valentin Schneider wrote:
>> On 19/04/23 22:19, Yury Norov wrote:
>> > +/*
>> > + * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
>> > + * cpumask: cpumask to find a cpu from
>> > + * cpu: current cpu
>> > + * node: local node
>> > + * hop: (in/out) indicates distance order of current CPU to a local node
>> > + *
>> > + * The function searches for next cpu at a given NUMA distance, indicated
>> > + * by hop, and if nothing found, tries to find CPUs at a greater distance,
>> > + * starting from the beginning.
>> > + *
>> > + * Return: cpu, or >= nr_cpu_ids when nothing found.
>> > + */
>> > +int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
>> > +{
>> > +	unsigned long *cur, *prev;
>> > +	struct cpumask ***masks;
>> > +	unsigned int ret;
>> > +
>> > +	if (*hop >= sched_domains_numa_levels)
>> > +		return nr_cpu_ids;
>> > +
>> > +	masks = rcu_dereference(sched_domains_numa_masks);
>> > +	cur = cpumask_bits(masks[*hop][node]);
>> > +	if (*hop == 0)
>> > +		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
>> > +	else {
>> > +		prev = cpumask_bits(masks[*hop - 1][node]);
>> > +		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
>> > +	}
>> > +
>> > +	if (ret < nr_cpu_ids)
>> > +		return ret;
>> > +
>> > +	*hop += 1;
>> > +	return sched_numa_find_next_cpu(cpus, 0, node, hop);
>> 
>> sched_domains_numa_levels is a fairly small number, so the recursion depth
>> isn't something we really need to worry about - still, the iterative
>> variant of this is fairly straightforward to get to:
>
> This is a tail recursion. Compiler normally converts it into the loop just
> as well. At least, my GCC does.

I'd hope so in 2023! I still prefer the iterative approach as I find it
more readable, but I'm not /too/ strongly attached to it.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-26  5:32     ` Yury Norov
@ 2023-04-26  9:17       ` Valentin Schneider
  0 siblings, 0 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-04-26  9:17 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 25/04/23 22:32, Yury Norov wrote:
> On Tue, Apr 25, 2023 at 10:54:48AM +0100, Valentin Schneider wrote:
>> On 19/04/23 22:19, Yury Norov wrote:
>> > +/**
>> > + * for_each_numa_cpu - iterate over cpus in increasing order taking into account
>> > + *		       NUMA distances from a given node.
>> > + * @cpu: the (optionally unsigned) integer iterator
>> > + * @hop: the iterator variable, must be initialized to a desired minimal hop.
>> > + * @node: the NUMA node to start the search from.
>> > + * @mask: the cpumask pointer
>> > + *
>> > + * Requires rcu_lock to be held.
>> > + */
>> > +#define for_each_numa_cpu(cpu, hop, node, mask)					\
>> > +	for ((cpu) = 0, (hop) = 0;						\
>> > +		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
>> > +		(cpu) < nr_cpu_ids;						\
>> > +		(cpu)++)
>> > +
>> 
>> I think we can keep sched_numa_find_next_cpu() as-is, but could we make
>> that macro use cpu_possible_mask by default? We can always add a variant
>> if/when we need to feed in a different mask.
>
> As mentioned in discussion to the driver's patch, all that numa things
> imply only online CPUs, so cpu_possible_mask may mislead to some extent. 
>
> Anyways, can you elaborate what you exactly want? Like this?
>
>  #define for_each_numa_online_cpu(cpu, hop, node)       \
>         for_each_numa_cpu(cpu, hop, node, cpu_online_mask)

Yeah, something like that. Like you said, the NUMA cpumasks built by the
scheduler reflect the online topology, so s/possible/online/ shouldn't
change much here.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-26  5:50     ` Yury Norov
@ 2023-04-26  9:17       ` Valentin Schneider
  2023-04-26 20:51         ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2023-04-26  9:17 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 25/04/23 22:50, Yury Norov wrote:
> Hi Valentin,
>
> Thanks for review!
>
> On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
>> On 19/04/23 22:19, Yury Norov wrote:
>> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
>> > +		unsigned int hop, c = 0;
>> > +
>> > +		rcu_read_lock();
>> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> > +		rcu_read_unlock();
>> > +	}
>> 
>> I'm not fond of the export of sched_domains_numa_levels, especially
>> considering it's just there for tests.
>> 
>> Furthermore, is there any value is testing parity with
>> cpumask_local_spread()?
>
> I wanted to emphasize that new NUMA-aware functions are coherent with
> each other, just like find_nth_bit() is coherent with find_next_bit().
>
> But all that coherence looks important only in non-NUMA case, because
> client code may depend on fact that next CPU is never less than current.
> This doesn't hold for NUMA iterators anyways...
>

Ah right, I see your point. But yes, distance-ordered walks break this
assumption.

>> Rather, shouldn't we check that using this API does
>> yield CPUs of increasing NUMA distance?
>> 
>> Something like
>> 
>>         for_each_node(node) {
>>                 unsigned int prev_cpu, hop = 0;
>> 
>>                 cpu = cpumask_first(cpumask_of_node(node));
>>                 prev_cpu = cpu;
>> 
>>                 rcu_read_lock();
>> 
>>                 /* Assert distance is monotonically increasing */
>>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>>                         prev_cpu = cpu;
>>                 }
>> 
>>                 rcu_read_unlock();
>>         }
>
> Your version of the test looks more straightforward. I need to think
> for more, but it looks like I can take it in v3.
>

I realized I only wrote half the relevant code - comparing node IDs is
meaningless, I meant to compare distances as we walk through the
CPUs... I tested the below against a few NUMA topologies and it seems to be
sane:

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6becb044a66f0..8f8512d139d58 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -174,11 +174,23 @@ __check_eq_str(const char *srcfile, unsigned int line,
 	return eq;
 }
 
-#define __expect_eq(suffix, ...)					\
+static bool __init
+__check_ge_uint(const char *srcfile, unsigned int line,
+		const unsigned int a, unsigned int b)
+{
+	if (a < b) {
+		pr_err("[%s:%u] expected a(%u) >= b(%u)\n",
+			srcfile, line, a, b);
+		return false;
+	}
+	return true;
+}
+
+#define __expect_op(op, suffix, ...)					\
 	({								\
 		int result = 0;						\
 		total_tests++;						\
-		if (!__check_eq_ ## suffix(__FILE__, __LINE__,		\
+		if (!__check_## op ## _ ## suffix(__FILE__, __LINE__,	\
 					   ##__VA_ARGS__)) {		\
 			failed_tests++;					\
 			result = 1;					\
@@ -186,6 +198,9 @@ __check_eq_str(const char *srcfile, unsigned int line,
 		result;							\
 	})
 
+#define __expect_eq(suffix, ...) __expect_op(eq, suffix, ##__VA_ARGS__)
+#define __expect_ge(suffix, ...) __expect_op(ge, suffix, ##__VA_ARGS__)
+
 #define expect_eq_uint(...)		__expect_eq(uint, ##__VA_ARGS__)
 #define expect_eq_bitmap(...)		__expect_eq(bitmap, ##__VA_ARGS__)
 #define expect_eq_pbl(...)		__expect_eq(pbl, ##__VA_ARGS__)
@@ -193,6 +208,8 @@ __check_eq_str(const char *srcfile, unsigned int line,
 #define expect_eq_clump8(...)		__expect_eq(clump8, ##__VA_ARGS__)
 #define expect_eq_str(...)		__expect_eq(str, ##__VA_ARGS__)
 
+#define expect_ge_uint(...)		__expect_ge(uint, ##__VA_ARGS__)
+
 static void __init test_zero_clear(void)
 {
 	DECLARE_BITMAP(bmap, 1024);
@@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
 {
 	unsigned int cpu, node;
 
-	for (node = 0; node < sched_domains_numa_levels; node++) {
-		unsigned int hop, c = 0;
+	for_each_node(node) {
+		unsigned int start_cpu, prev_dist, hop = 0;
+
+		cpu = cpumask_first(cpumask_of_node(node));
+		prev_dist = node_distance(node, node);
+		start_cpu = cpu;
 
 		rcu_read_lock();
-		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
-			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+
+		/* Assert distance is monotonically increasing */
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
+			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
+
+			expect_ge_uint(dist, prev_dist);
+			prev_dist = dist;
+		}
+
 		rcu_read_unlock();
 	}
 }


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-26  9:17       ` Valentin Schneider
@ 2023-04-26 20:51         ` Yury Norov
  2023-04-27  9:35           ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-26 20:51 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

> I realized I only wrote half the relevant code - comparing node IDs is
> meaningless, I meant to compare distances as we walk through the
> CPUs... I tested the below against a few NUMA topologies and it seems to be
> sane:
> 
> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>  {
>  	unsigned int cpu, node;
>  
> -	for (node = 0; node < sched_domains_numa_levels; node++) {
> -		unsigned int hop, c = 0;
> +	for_each_node(node) {
> +		unsigned int start_cpu, prev_dist, hop = 0;
> +
> +		cpu = cpumask_first(cpumask_of_node(node));
> +		prev_dist = node_distance(node, node);
> +		start_cpu = cpu;
>  
>  		rcu_read_lock();
> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +
> +		/* Assert distance is monotonically increasing */
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));

Interestingly, node_distance() is an arch-specific function. Generic
implementation is quite useless:

 #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Particularly, arm64 takes the above. With node_distance() implemented
like that, we can barely test something...

Taking that into the account, I think it's better to test iterator against
cpumask_local_spread(), like in v2. I'll add a comment about that in v3.

> +
> +			expect_ge_uint(dist, prev_dist);
> +			prev_dist = dist;
> +		}
> +
>  		rcu_read_unlock();
>  	}
>  }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-04-26 20:51         ` Yury Norov
@ 2023-04-27  9:35           ` Valentin Schneider
  0 siblings, 0 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-04-27  9:35 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 26/04/23 13:51, Yury Norov wrote:
>> I realized I only wrote half the relevant code - comparing node IDs is
>> meaningless, I meant to compare distances as we walk through the
>> CPUs... I tested the below against a few NUMA topologies and it seems to be
>> sane:
>> 
>> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>>  {
>>  	unsigned int cpu, node;
>>  
>> -	for (node = 0; node < sched_domains_numa_levels; node++) {
>> -		unsigned int hop, c = 0;
>> +	for_each_node(node) {
>> +		unsigned int start_cpu, prev_dist, hop = 0;
>> +
>> +		cpu = cpumask_first(cpumask_of_node(node));
>> +		prev_dist = node_distance(node, node);
>> +		start_cpu = cpu;
>>  
>>  		rcu_read_lock();
>> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> +
>> +		/* Assert distance is monotonically increasing */
>> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
>
> Interestingly, node_distance() is an arch-specific function. Generic
> implementation is quite useless:
>
>  #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>
> Particularly, arm64 takes the above. With node_distance() implemented
> like that, we can barely test something...
>

riscv and arm64 rely on drivers/base/arch_numa.c to provide
__node_distance() (cf. CONFIG_GENERIC_ARCH_NUMA).

x86, sparc, powerpc and ia64 define __node_distance()
loongarch and mips define their own node_distance().

So all of those archs will have a usable node_distance(), the others won't
and that means the scheduler can't do anything about it - the scheduler
relies on node_distance() to understand the topolgoy!


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-31 17:08     ` Yury Norov
@ 2023-05-31 17:36       ` Jakub Kicinski
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Kicinski @ 2023-05-31 17:36 UTC (permalink / raw)
  To: Yury Norov
  Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Wed, 31 May 2023 10:08:58 -0700 Yury Norov wrote:
> On Wed, May 31, 2023 at 10:01:25AM -0700, Jakub Kicinski wrote:
> > On Wed, 31 May 2023 08:43:46 -0700 Yury Norov wrote:  
> > > Now that the series reviewed, can you consider taking it in sched
> > > tree?  
> > 
> > Do you mean someone else or did you mean the net-next tree?  
> 
> Sorry, net-next.

I'm a bit of a coward. I don't trust my ability to judge this code,
and it seems Linus has opinions about it :( The mlx5 patch looks like 
a small refactoring which can wait until 6.6. I don't feel like net-next
is the best path downstream for this series :(

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-31 17:01   ` Jakub Kicinski
@ 2023-05-31 17:08     ` Yury Norov
  2023-05-31 17:36       ` Jakub Kicinski
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-05-31 17:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Wed, May 31, 2023 at 10:01:25AM -0700, Jakub Kicinski wrote:
> On Wed, 31 May 2023 08:43:46 -0700 Yury Norov wrote:
> > On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> > > for_each_cpu() is widely used in kernel, and it's beneficial to create
> > > a NUMA-aware version of the macro.
> > > 
> > > Recently added for_each_numa_hop_mask() works, but switching existing
> > > codebase to it is not an easy process.
> > > 
> > > This series adds for_each_numa_cpu(), which is designed to be similar to
> > > the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> > > simple as adding a hop iterator variable and passing it inside new macro.
> > > for_each_numa_cpu() takes care of the rest.  
> > 
> > Hi Jakub,
> > 
> > Now that the series reviewed, can you consider taking it in sched
> > tree?
> 
> Do you mean someone else or did you mean the net-next tree?

Sorry, net-next.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-31 15:43 ` Yury Norov
@ 2023-05-31 17:01   ` Jakub Kicinski
  2023-05-31 17:08     ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Kicinski @ 2023-05-31 17:01 UTC (permalink / raw)
  To: Yury Norov
  Cc: netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Wed, 31 May 2023 08:43:46 -0700 Yury Norov wrote:
> On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> > for_each_cpu() is widely used in kernel, and it's beneficial to create
> > a NUMA-aware version of the macro.
> > 
> > Recently added for_each_numa_hop_mask() works, but switching existing
> > codebase to it is not an easy process.
> > 
> > This series adds for_each_numa_cpu(), which is designed to be similar to
> > the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> > simple as adding a hop iterator variable and passing it inside new macro.
> > for_each_numa_cpu() takes care of the rest.  
> 
> Hi Jakub,
> 
> Now that the series reviewed, can you consider taking it in sched
> tree?

Do you mean someone else or did you mean the net-next tree?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-05-02 16:59 ` Valentin Schneider
@ 2023-05-31 15:43 ` Yury Norov
  2023-05-31 17:01   ` Jakub Kicinski
  1 sibling, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-05-31 15:43 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

On Sun, Apr 30, 2023 at 10:18:01AM -0700, Yury Norov wrote:
> for_each_cpu() is widely used in kernel, and it's beneficial to create
> a NUMA-aware version of the macro.
> 
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
> 
> This series adds for_each_numa_cpu(), which is designed to be similar to
> the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> simple as adding a hop iterator variable and passing it inside new macro.
> for_each_numa_cpu() takes care of the rest.

Hi Jakub,

Now that the series reviewed, can you consider taking it in sched
tree?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-02 21:58   ` Yury Norov
@ 2023-05-03 10:00     ` Valentin Schneider
  0 siblings, 0 replies; 36+ messages in thread
From: Valentin Schneider @ 2023-05-03 10:00 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 02/05/23 14:58, Yury Norov wrote:
>> LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
>> as expected. Thanks for working on this!
>>
>> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
>
> Thank you Valentin. If you spent time testing the series, why
> don't you add your Tested-by?

Well, I only ran the test_bitmap stuff and checked the output of the
iterator then, I didn't get to test on actual hardware with a mellanox card
:-)

But yeah, I suppose that does count for the rest, so feel free to add to
all patches but #5:

Tested-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-05-02 16:59 ` Valentin Schneider
@ 2023-05-02 21:58   ` Yury Norov
  2023-05-03 10:00     ` Valentin Schneider
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-05-02 21:58 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Tariq Toukan, Gal Pressman,
	Greg Kroah-Hartman, Heiko Carstens, Barry Song

> LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
> as expected. Thanks for working on this!
>
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>

Thank you Valentin. If you spent time testing the series, why
don't you add your Tested-by?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-05-02 16:59 ` Valentin Schneider
  2023-05-02 21:58   ` Yury Norov
  2023-05-31 15:43 ` Yury Norov
  1 sibling, 1 reply; 36+ messages in thread
From: Valentin Schneider @ 2023-05-02 16:59 UTC (permalink / raw)
  To: Yury Norov, Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On 30/04/23 10:18, Yury Norov wrote:
> for_each_cpu() is widely used in kernel, and it's beneficial to create
> a NUMA-aware version of the macro.
>
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
>
> This series adds for_each_numa_cpu(), which is designed to be similar to
> the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> simple as adding a hop iterator variable and passing it inside new macro.
> for_each_numa_cpu() takes care of the rest.
>
> At the moment, we have 2 users of NUMA-aware enumerators. One is
> Melanox's in-tree driver, and another is Intel's in-review driver:
>
> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/
>
> Both real-life examples follow the same pattern:
>
>         for_each_numa_hop_mask(cpus, prev, node) {
>                 for_each_cpu_andnot(cpu, cpus, prev) {
>                         if (cnt++ == max_num)
>                                 goto out;
>                         do_something(cpu);
>                 }
>                 prev = cpus;
>         }
>
> With the new macro, it has a more standard look, like this:
>
>         for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> Straight conversion of existing for_each_cpu() codebase to NUMA-aware
> version with for_each_numa_hop_mask() is difficult because it doesn't
> take a user-provided cpu mask, and eventually ends up with open-coded
> double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
> Consider the NUMA-ignorant example:
>
>         cpumask_t cpus = get_mask();
>         int cnt = 0, cpu;
>
>         for_each_cpu(cpu, cpus) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> Converting it to NUMA-aware version would be as simple as:
>
>         cpumask_t cpus = get_mask();
>         int node = get_node();
>         int cnt = 0, hop, cpu;
>
>         for_each_numa_cpu(cpu, hop, node, cpus) {
>                 if (cnt++ == max_num)
>                         break;
>                 do_something(cpu);
>         }
>
> The latter looks more verbose and avoids from open-coding that annoying
> double loop. Another advantage is that it works with a 'hop' parameter with
> the clear meaning of NUMA distance, and doesn't make people not familiar
> to enumerator internals bothering with current and previous masks machinery.
>

LGTM, I ran the tests on a few NUMA topologies and that all seems to behave
as expected. Thanks for working on this! 

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-30 17:18 Yury Norov
  2023-05-02 16:59 ` Valentin Schneider
  2023-05-31 15:43 ` Yury Norov
  0 siblings, 2 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-30 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2: https://lore.kernel.org/netdev/ZD3l6FBnUh9vTIGc@yury-ThinkPad/T/
v3:
 - fix sched_numa_find_{next,nth}_cpu() when CONFIG_NUMA is off to
   only traverse online CPUs;
 - don't export sched_domains_numa_levels for testing purposes. In
   the test, use for_each_node() macro;
 - extend the test for for_each_node();
 - in comments, mention that only online CPUs are traversed;
 - rebase on top of 6.3. 

Yury Norov (8):
  sched: fix sched_numa_find_nth_cpu() in non-NUMA case
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_{,online}_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched: drop for_each_numa_hop_mask()
  lib: test for_each_numa_cpus()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++---
 include/linux/find.h                         | 43 ++++++++++++
 include/linux/topology.h                     | 40 ++++++-----
 kernel/sched/topology.c                      | 53 ++++++++-------
 lib/cpumask.c                                |  7 +-
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 70 +++++++++++++++++++-
 7 files changed, 183 insertions(+), 58 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-17 11:11 ` Peter Zijlstra
@ 2023-04-18  0:51   ` Yury Norov
  0 siblings, 0 replies; 36+ messages in thread
From: Yury Norov @ 2023-04-18  0:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Mon, Apr 17, 2023 at 01:11:37PM +0200, Peter Zijlstra wrote:
> 
> 
> Something went wrong with sending your patches; I only have 0/n.

Indeed. Thanks for pointing. I'll resend shortly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-04-15  5:06 Yury Norov
@ 2023-04-17 11:11 ` Peter Zijlstra
  2023-04-18  0:51   ` Yury Norov
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2023-04-17 11:11 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song



Something went wrong with sending your patches; I only have 0/n.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-04-15  5:06 Yury Norov
  2023-04-17 11:11 ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Yury Norov @ 2023-04-15  5:06 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

        for_each_numa_hop_mask(cpus, prev, node) {
                for_each_cpu_andnot(cpu, cpus, prev) {
                        if (cnt++ == max_num)
                                goto out;
                        do_something(cpu);
                }
                prev = cpus;
        }

With the new macro, it has a more standard look, like this:

        for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

        cpumask_t cpus = get_mask();
        int cnt = 0, cpu;

        for_each_cpu(cpu, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

Converting it to NUMA-aware version would be as simple as:

        cpumask_t cpus = get_mask();
        int node = get_node();
        int cnt = 0, hop, cpu;

        for_each_numa_cpu(cpu, hop, node, cpus) {
                if (cnt++ == max_num)
                        break;
                do_something(cpu);
        }

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

v2:
 - repase on top of master;
 - cleanup comments and tweak them to comply with kernel-doc;
 - remove RFC from patch #8 as there's no objections.

Yury Norov (8):
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched/topology: export sched_domains_numa_levels
  lib: add test for for_each_numa_{cpu,hop_mask}()
  sched: drop for_each_numa_hop_mask()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++----
 include/linux/find.h                         | 43 ++++++++++++++
 include/linux/topology.h                     | 37 ++++++------
 kernel/sched/topology.c                      | 59 +++++++++++---------
 lib/cpumask.c                                |  7 +--
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 16 ++++++
 7 files changed, 133 insertions(+), 57 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-05-31 17:36 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-20  5:19 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-04-20  5:19 ` [PATCH v2 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
2023-04-20  5:19 ` [PATCH v2 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
2023-04-25  9:54   ` Valentin Schneider
2023-04-26  5:26     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-20  5:19 ` [PATCH v2 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-04-25  9:54   ` Valentin Schneider
2023-04-26  5:32     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-20  5:19 ` [PATCH v2 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
2023-04-20  8:27   ` Tariq Toukan
2023-04-20 22:45     ` Yury Norov
2023-04-20  5:19 ` [PATCH v2 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
2023-04-20  5:19 ` [PATCH v2 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
2023-04-20  5:19 ` [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
2023-04-24 17:09   ` Valentin Schneider
2023-04-26  5:50     ` Yury Norov
2023-04-26  9:17       ` Valentin Schneider
2023-04-26 20:51         ` Yury Norov
2023-04-27  9:35           ` Valentin Schneider
2023-04-20  5:19 ` [PATCH v2 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
2023-04-20 10:15   ` kernel test robot
2023-04-21  8:38   ` kernel test robot
2023-04-21  8:38   ` kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2023-04-30 17:18 [PATCH v2 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-05-02 16:59 ` Valentin Schneider
2023-05-02 21:58   ` Yury Norov
2023-05-03 10:00     ` Valentin Schneider
2023-05-31 15:43 ` Yury Norov
2023-05-31 17:01   ` Jakub Kicinski
2023-05-31 17:08     ` Yury Norov
2023-05-31 17:36       ` Jakub Kicinski
2023-04-15  5:06 Yury Norov
2023-04-17 11:11 ` Peter Zijlstra
2023-04-18  0:51   ` Yury Norov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.