linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro
@ 2023-03-25 18:55 Yury Norov
  2023-03-25 18:55 ` [PATCH 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in kernel, and it's beneficial to create
a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

This series adds for_each_numa_cpu(), which is designed to be similar to
the for_each_cpu(). It allows to convert existing code to NUMA-aware as
simple as adding a hop iterator variable and passing it inside new macro.
for_each_numa_cpu() takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

	for_each_numa_hop_mask(cpus, prev, node) {
 		for_each_cpu_andnot(cpu, cpus, prev) {
 			if (cnt++ == max_num)
 				goto out;
 			do_something(cpu);
 		}
		prev = cpus;
 	}

With the new macro, it has a more standard look, like this:

	for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

	cpumask_t cpus = get_mask();
	int cnt = 0, cpu;

	for_each_cpu(cpu, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Converting it to NUMA-aware version would be as simple as:

	cpumask_t cpus = get_mask();
	int node = get_node();
	int cnt = 0, hop, cpu;

	for_each_numa_cpu(cpu, hop, node, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

Yury Norov (8):
  lib/find: add find_next_and_andnot_bit()
  sched/topology: introduce sched_numa_find_next_cpu()
  sched/topology: add for_each_numa_cpu() macro
  net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  lib/cpumask: update comment to cpumask_local_spread()
  sched/topology: export sched_domains_numa_levels
  lib: add test for for_each_numa_{cpu,hop_mask}()
  sched: drop for_each_numa_hop_mask()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++----
 include/linux/find.h                         | 43 ++++++++++++++
 include/linux/topology.h                     | 39 ++++++++-----
 kernel/sched/topology.c                      | 59 +++++++++++---------
 lib/cpumask.c                                |  7 +--
 lib/find_bit.c                               | 12 ++++
 lib/test_bitmap.c                            | 16 ++++++
 7 files changed, 136 insertions(+), 56 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/8] lib/find: add find_next_and_andnot_bit()
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-27 10:26   ` Andy Shevchenko
  2023-03-25 18:55 ` [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Similarly to find_nth_and_andnot_bit(), find_next_and_andnot_bit() is
a convenient helper that allows traversing bitmaps without storing
intermediate results in a temporary bitmap.

In the following patches the function is used to implement NUMA-aware
CPUs enumeration.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 lib/find_bit.c       | 12 ++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 4647864a5ffd..bde0ba9fa59b 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -14,6 +14,9 @@ unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long
 					unsigned long nbits, unsigned long start);
 unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start);
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
+					const unsigned long *addr3, unsigned long nbits,
+					unsigned long start);
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
@@ -127,6 +130,40 @@ unsigned long find_next_andnot_bit(const unsigned long *addr1,
 }
 #endif
 
+#ifndef find_next_and_andnot_bit
+/**
+ * find_next_and_andnot_bit - find the next bit set in *addr1 and *addr2,
+ *			      excluding all the bits in *addr3
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @addr3: The third address to base the search on
+ * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static __always_inline
+unsigned long find_next_and_andnot_bit(const unsigned long *addr1,
+				   const unsigned long *addr2,
+				   const unsigned long *addr3,
+				   unsigned long size,
+				   unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = *addr1 & *addr2 & ~*addr3 & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_next_and_andnot_bit(addr1, addr2, addr3, size, offset);
+}
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
@@ -536,6 +573,12 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_andnot_bit(bit, addr1, addr2, addr3, size) \
+	for ((bit) = 0;									\
+	     (bit) = find_next_and_andnot_bit((addr1), (addr2), (addr3), (size), (bit)),\
+	     (bit) < (size);								\
+	     (bit)++)
+
 /* same as for_each_set_bit() but use bit as value to start with */
 #define for_each_set_bit_from(bit, addr, size) \
 	for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
diff --git a/lib/find_bit.c b/lib/find_bit.c
index c10920e66788..8e2a6b87262f 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -182,6 +182,18 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 EXPORT_SYMBOL(_find_next_andnot_bit);
 #endif
 
+#ifndef find_next_and_andnot_bit
+unsigned long _find_next_and_andnot_bit(const unsigned long *addr1,
+					const unsigned long *addr2,
+					const unsigned long *addr3,
+					unsigned long nbits,
+					unsigned long start)
+{
+	return FIND_NEXT_BIT(addr1[idx] & addr2[idx] & ~addr3[idx], /* nop */, nbits, start);
+}
+EXPORT_SYMBOL(_find_next_and_andnot_bit);
+#endif
+
 #ifndef find_next_zero_bit
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-03-25 18:55 ` [PATCH 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-27 10:28   ` Andy Shevchenko
  2023-03-25 18:55 ` [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The function searches for the next CPU in a given cpumask according to
NUMA topology, so that it traverses cpus per-hop.

If the CPU is the last cpu in a given hop, sched_numa_find_next_cpu()
switches to the next hop, and picks the first CPU from there, excluding
those already traversed.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h |  7 +++++++
 kernel/sched/topology.c  | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index fea32377f7c7..4a63154fa036 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -247,6 +247,7 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
 extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
@@ -254,6 +255,12 @@ static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, i
 	return cpumask_nth(cpu, cpus);
 }
 
+static __always_inline
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	return find_next_bit(cpumask_bits(cpus), small_cpumask_bits, cpu);
+}
+
 static inline const struct cpumask *
 sched_numa_hop_mask(unsigned int node, unsigned int hops)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 051aaf65c749..1860d9487fe1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2130,6 +2130,45 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu);
 
+/*
+ * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
+ * cpumask: cpumask to find a cpu from
+ * cpu: current cpu
+ * node: local node
+ * hop: (in/out) indicates distance order of current CPU to a local node
+ *
+ * The function searches for next cpu at a given NUMA distance, indicated
+ * by hop, and if nothing found, tries to find CPUs at a greater distance,
+ * starting from the beginning.
+ *
+ * returns: cpu, or >= nr_cpu_ids when nothing found.
+ */
+int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop)
+{
+	unsigned long *cur, *prev;
+	struct cpumask ***masks;
+	unsigned int ret;
+
+	if (*hop >= sched_domains_numa_levels)
+		return nr_cpu_ids;
+
+	masks = rcu_dereference(sched_domains_numa_masks);
+	cur = cpumask_bits(masks[*hop][node]);
+	if (*hop == 0)
+		ret = find_next_and_bit(cpumask_bits(cpus), cur, nr_cpu_ids, cpu);
+	else {
+		prev = cpumask_bits(masks[*hop - 1][node]);
+		ret = find_next_and_andnot_bit(cpumask_bits(cpus), cur, prev, nr_cpu_ids, cpu);
+	}
+
+	if (ret < nr_cpu_ids)
+		return ret;
+
+	*hop += 1;
+	return sched_numa_find_next_cpu(cpus, 0, node, hop);
+}
+EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
+
 /**
  * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
  *                         @node
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  2023-03-25 18:55 ` [PATCH 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
  2023-03-25 18:55 ` [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-04-10 18:05   ` Yury Norov
  2023-03-25 18:55 ` [PATCH 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_cpu() is widely used in the kernel, and it's beneficial to
create a NUMA-aware version of the macro.

Recently added for_each_numa_hop_mask() works, but switching existing
codebase to it is not an easy process.

New for_each_numa_cpu() is designed to be similar to the for_each_cpu().
It allows to convert existing code to NUMA-aware as simple as adding a
hop iterator variable and passing it inside new macro. for_each_numa_cpu()
takes care of the rest.

At the moment, we have 2 users of NUMA-aware enumerators. One is
Melanox's in-tree driver, and another is Intel's in-review driver:

https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Both real-life examples follow the same pattern:

	for_each_numa_hop_mask(cpus, prev, node) {
 		for_each_cpu_andnot(cpu, cpus, prev) {
 			if (cnt++ == max_num)
 				goto out;
 			do_something(cpu);
 		}
		prev = cpus;
 	}

With the new macro, it would look like this:

	for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Straight conversion of existing for_each_cpu() codebase to NUMA-aware
version with for_each_numa_hop_mask() is difficult because it doesn't
take a user-provided cpu mask, and eventually ends up with open-coded
double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
Consider the NUMA-ignorant example:

	cpumask_t cpus = get_mask();
	int cnt = 0, cpu;

	for_each_cpu(cpu, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

Converting it to NUMA-aware version would be as simple as:

	cpumask_t cpus = get_mask();
	int node = get_node();
	int cnt = 0, hop, cpu;

	for_each_numa_cpu(cpu, hop, node, cpus) {
		if (cnt++ == max_num)
			break;
		do_something(cpu);
 	}

The latter looks more verbose and avoids from open-coding that annoying
double loop. Another advantage is that it works with a 'hop' parameter with
the clear meaning of NUMA distance, and doesn't make people not familiar
to enumerator internals bothering with current and previous masks machinery.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 4a63154fa036..62a9dd8edd77 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -286,4 +286,24 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
 	     !IS_ERR_OR_NULL(mask);					       \
 	     __hops++)
 
+/**
+ * for_each_numa_cpu - iterate over cpus in increasing order taking into account
+ *		       NUMA distances from a given node.
+ * @cpu: the (optionally unsigned) integer iterator
+ * @hop: the iterator variable, must be initialized to a desired minimal hop.
+ * @node: the NUMA node to start the search from.
+ *
+ * Requires rcu_lock to be held.
+ *
+ * Because it's implemented as double-loop, using 'break' inside the body of
+ * iterator may lead to undefined behaviour. Use 'goto' instead.
+ *
+ * Yields intersection of @mask and cpu_online_mask if @node == NUMA_NO_NODE.
+ */
+#define for_each_numa_cpu(cpu, hop, node, mask)					\
+	for ((cpu) = 0, (hop) = 0;						\
+		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
+		(cpu) < nr_cpu_ids;						\
+		(cpu)++)
+
 #endif /* _LINUX_TOPOLOGY_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (2 preceding siblings ...)
  2023-03-25 18:55 ` [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-25 18:55 ` [PATCH 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

for_each_numa_cpu() is a more straightforward alternative to
for_each_numa_hop_mask() + for_each_cpu_andnot().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 38b32e98f3bd..80368952e9b1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -817,12 +817,10 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
 static int comp_irqs_request(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = dev->priv.eq_table;
-	const struct cpumask *prev = cpu_none_mask;
-	const struct cpumask *mask;
 	int ncomp_eqs = table->num_comp_eqs;
 	u16 *cpus;
 	int ret;
-	int cpu;
+	int cpu, hop;
 	int i;
 
 	ncomp_eqs = table->num_comp_eqs;
@@ -844,15 +842,11 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
 
 	i = 0;
 	rcu_read_lock();
-	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
-		for_each_cpu_andnot(cpu, mask, prev) {
-			cpus[i] = cpu;
-			if (++i == ncomp_eqs)
-				goto spread_done;
-		}
-		prev = mask;
+	for_each_numa_cpu(cpu, hop, dev->priv.numa_node, cpu_possible_mask) {
+		cpus[i] = cpu;
+		if (++i == ncomp_eqs)
+			break;
 	}
-spread_done:
 	rcu_read_unlock();
 	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
 	kfree(cpus);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] lib/cpumask: update comment to cpumask_local_spread()
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (3 preceding siblings ...)
  2023-03-25 18:55 ` [PATCH 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-25 18:55 ` [PATCH 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have a for_each_numa_cpu(), which is a more straightforward
replacement to the cpumask_local_spread() when it comes to enumeration
of CPUs with respect to NUMA topology, it's worth to update the comment.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/cpumask.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index e7258836b60b..151d1dc5c593 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -127,11 +127,8 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
  *
  * There's a better alternative based on for_each()-like iterators:
  *
- *	for_each_numa_hop_mask(mask, node) {
- *		for_each_cpu_andnot(cpu, mask, prev)
- *			do_something(cpu);
- *		prev = mask;
- *	}
+ *	for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+ *		do_something(cpu);
  *
  * It's simpler and more verbose than above. Complexity of iterator-based
  * enumeration is O(sched_domains_numa_levels * nr_cpu_ids), while
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] sched/topology: export sched_domains_numa_levels
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (4 preceding siblings ...)
  2023-03-25 18:55 ` [PATCH 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-25 18:55 ` [PATCH 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The following patch adds a test for NUMA-aware CPU enumerators, and it
requires an access to sched_domains_numa_levels.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/topology.h |  7 +++++++
 kernel/sched/topology.c  | 10 ++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 62a9dd8edd77..3d8d486c817d 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -43,6 +43,13 @@
 	for_each_online_node(node)			\
 		if (nr_cpus_node(node))
 
+#ifdef CONFIG_NUMA
+extern int __sched_domains_numa_levels;
+#define sched_domains_numa_levels ((const int)__sched_domains_numa_levels)
+#else
+#define sched_domains_numa_levels (1)
+#endif
+
 int arch_update_cpu_topology(void);
 
 /* Conform to ACPI 2.0 SLIT distance definitions */
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 1860d9487fe1..5f5f994a56da 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1508,7 +1508,9 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 #ifdef CONFIG_NUMA
 enum numa_topology_type sched_numa_topology_type;
 
-static int			sched_domains_numa_levels;
+int				__sched_domains_numa_levels;
+EXPORT_SYMBOL_GPL(__sched_domains_numa_levels);
+
 static int			sched_domains_curr_level;
 
 int				sched_max_numa_distance;
@@ -1872,7 +1874,7 @@ void sched_init_numa(int offline_node)
 	 *
 	 * We reset it to 'nr_levels' at the end of this function.
 	 */
-	sched_domains_numa_levels = 0;
+	__sched_domains_numa_levels = 0;
 
 	masks = kzalloc(sizeof(void *) * nr_levels, GFP_KERNEL);
 	if (!masks)
@@ -1948,7 +1950,7 @@ void sched_init_numa(int offline_node)
 	sched_domain_topology_saved = sched_domain_topology;
 	sched_domain_topology = tl;
 
-	sched_domains_numa_levels = nr_levels;
+	__sched_domains_numa_levels = nr_levels;
 	WRITE_ONCE(sched_max_numa_distance, sched_domains_numa_distance[nr_levels - 1]);
 
 	init_numa_topology_type(offline_node);
@@ -1961,7 +1963,7 @@ static void sched_reset_numa(void)
 	struct cpumask ***masks;
 
 	nr_levels = sched_domains_numa_levels;
-	sched_domains_numa_levels = 0;
+	__sched_domains_numa_levels = 0;
 	sched_max_numa_distance = 0;
 	sched_numa_topology_type = NUMA_DIRECT;
 	distances = sched_domains_numa_distance;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (5 preceding siblings ...)
  2023-03-25 18:55 ` [PATCH 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-03-25 18:55 ` [RFC PATCH 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
  2023-04-10 18:09 ` [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

The test ensures that enumerators' output is consistent with
cpumask_local_spread().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index a8005ad3bd58..1b5f805f6879 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -12,6 +12,7 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/topology.h>
 #include <linux/uaccess.h>
 
 #include "../tools/testing/selftests/kselftest_module.h"
@@ -751,6 +752,33 @@ static void __init test_for_each_set_bit_wrap(void)
 	}
 }
 
+static void __init test_for_each_numa(void)
+{
+	unsigned int cpu, node;
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		const struct cpumask *m, *p = cpu_none_mask;
+		unsigned int c = 0;
+
+		rcu_read_lock();
+		for_each_numa_hop_mask(m, node) {
+			for_each_cpu_andnot(cpu, m, p)
+				expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+			p = m;
+		}
+		rcu_read_unlock();
+	}
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		unsigned int hop, c = 0;
+
+		rcu_read_lock();
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+		rcu_read_unlock();
+	}
+}
+
 static void __init test_for_each_set_bit(void)
 {
 	DECLARE_BITMAP(orig, 500);
@@ -1237,6 +1265,7 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+	test_for_each_numa();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 8/8] sched: drop for_each_numa_hop_mask()
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (6 preceding siblings ...)
  2023-03-25 18:55 ` [PATCH 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
@ 2023-03-25 18:55 ` Yury Norov
  2023-04-10 18:09 ` [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-03-25 18:55 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Yury Norov, Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

Now that we have for_each_numa_cpu(), for_each_numa_hop_mask()
and all related code is a deadcode. Drop it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
This is RFC because despite that the code below is unused, there may be
new users, particularly Intel, and I don't want to cut people on the fly.
Those interested in for_each_numa_hop_mask(), please send NAKs.

 include/linux/topology.h | 25 -------------------------
 kernel/sched/topology.c  | 32 --------------------------------
 lib/test_bitmap.c        | 13 -------------
 3 files changed, 70 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 3d8d486c817d..d2defd59b2d0 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -255,7 +255,6 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
 #ifdef CONFIG_NUMA
 int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node);
 int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsigned int *hop);
-extern const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops);
 #else
 static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node)
 {
@@ -267,32 +266,8 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 {
 	return cpumask_next(cpu, cpus);
 }
-
-static inline const struct cpumask *
-sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	return ERR_PTR(-EOPNOTSUPP);
-}
 #endif	/* CONFIG_NUMA */
 
-/**
- * for_each_numa_hop_mask - iterate over cpumasks of increasing NUMA distance
- *                          from a given node.
- * @mask: the iteration variable.
- * @node: the NUMA node to start the search from.
- *
- * Requires rcu_lock to be held.
- *
- * Yields cpu_online_mask for @node == NUMA_NO_NODE.
- */
-#define for_each_numa_hop_mask(mask, node)				       \
-	for (unsigned int __hops = 0;					       \
-	     mask = (node != NUMA_NO_NODE || __hops) ?			       \
-		     sched_numa_hop_mask(node, __hops) :		       \
-		     cpu_online_mask,					       \
-	     !IS_ERR_OR_NULL(mask);					       \
-	     __hops++)
-
 /**
  * for_each_numa_cpu - iterate over cpus in increasing order taking into account
  *		       NUMA distances from a given node.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5f5f994a56da..2842a4d10624 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2171,38 +2171,6 @@ int sched_numa_find_next_cpu(const struct cpumask *cpus, int cpu, int node, unsi
 }
 EXPORT_SYMBOL_GPL(sched_numa_find_next_cpu);
 
-/**
- * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away from
- *                         @node
- * @node: The node to count hops from.
- * @hops: Include CPUs up to that many hops away. 0 means local node.
- *
- * Return: On success, a pointer to a cpumask of CPUs at most @hops away from
- * @node, an error value otherwise.
- *
- * Requires rcu_lock to be held. Returned cpumask is only valid within that
- * read-side section, copy it if required beyond that.
- *
- * Note that not all hops are equal in distance; see sched_init_numa() for how
- * distances and masks are handled.
- * Also note that this is a reflection of sched_domains_numa_masks, which may change
- * during the lifetime of the system (offline nodes are taken out of the masks).
- */
-const struct cpumask *sched_numa_hop_mask(unsigned int node, unsigned int hops)
-{
-	struct cpumask ***masks;
-
-	if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
-		return ERR_PTR(-EINVAL);
-
-	masks = rcu_dereference(sched_domains_numa_masks);
-	if (!masks)
-		return ERR_PTR(-EBUSY);
-
-	return masks[hops][node];
-}
-EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
-
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 1b5f805f6879..6becb044a66f 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -756,19 +756,6 @@ static void __init test_for_each_numa(void)
 {
 	unsigned int cpu, node;
 
-	for (node = 0; node < sched_domains_numa_levels; node++) {
-		const struct cpumask *m, *p = cpu_none_mask;
-		unsigned int c = 0;
-
-		rcu_read_lock();
-		for_each_numa_hop_mask(m, node) {
-			for_each_cpu_andnot(cpu, m, p)
-				expect_eq_uint(cpumask_local_spread(c++, node), cpu);
-			p = m;
-		}
-		rcu_read_unlock();
-	}
-
 	for (node = 0; node < sched_domains_numa_levels; node++) {
 		unsigned int hop, c = 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] lib/find: add find_next_and_andnot_bit()
  2023-03-25 18:55 ` [PATCH 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
@ 2023-03-27 10:26   ` Andy Shevchenko
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Shevchenko @ 2023-03-27 10:26 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Sat, Mar 25, 2023 at 11:55:07AM -0700, Yury Norov wrote:
> Similarly to find_nth_and_andnot_bit(), find_next_and_andnot_bit() is
> a convenient helper that allows traversing bitmaps without storing
> intermediate results in a temporary bitmap.
> 
> In the following patches the function is used to implement NUMA-aware
> CPUs enumeration.

...

> +/**
> + * find_next_and_andnot_bit - find the next bit set in *addr1 and *addr2,
> + *			      excluding all the bits in *addr3
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @addr3: The third address to base the search on
> + * @size: The bitmap size in bits
> + * @offset: The bitnumber to start searching at

> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.

`kernel-doc -v` nowadays complains about absence of the Return: section.
Can we start providing it in the expected format?

Ditto for other documentation excerpts (old and new).

> + */

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-03-25 18:55 ` [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
@ 2023-03-27 10:28   ` Andy Shevchenko
  2023-04-10 18:00     ` Yury Norov
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2023-03-27 10:28 UTC (permalink / raw)
  To: Yury Norov
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Sat, Mar 25, 2023 at 11:55:08AM -0700, Yury Norov wrote:
> The function searches for the next CPU in a given cpumask according to
> NUMA topology, so that it traverses cpus per-hop.
> 
> If the CPU is the last cpu in a given hop, sched_numa_find_next_cpu()
> switches to the next hop, and picks the first CPU from there, excluding
> those already traversed.

...

> +/*

Hmm... Is it deliberately not a kernel doc?

> + * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
> + * cpumask: cpumask to find a cpu from
> + * cpu: current cpu
> + * node: local node
> + * hop: (in/out) indicates distance order of current CPU to a local node
> + *
> + * The function searches for next cpu at a given NUMA distance, indicated
> + * by hop, and if nothing found, tries to find CPUs at a greater distance,
> + * starting from the beginning.
> + *
> + * returns: cpu, or >= nr_cpu_ids when nothing found.
> + */

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu()
  2023-03-27 10:28   ` Andy Shevchenko
@ 2023-04-10 18:00     ` Yury Norov
  0 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-04-10 18:00 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Jakub Kicinski, netdev, linux-rdma, linux-kernel, Saeed Mahameed,
	Pawel Chmielewski, Leon Romanovsky, David S. Miller,
	Eric Dumazet, Paolo Abeni, Rasmus Villemoes, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Tariq Toukan,
	Gal Pressman, Greg Kroah-Hartman, Heiko Carstens, Barry Song

On Mon, Mar 27, 2023 at 01:28:12PM +0300, Andy Shevchenko wrote:
> On Sat, Mar 25, 2023 at 11:55:08AM -0700, Yury Norov wrote:
> > The function searches for the next CPU in a given cpumask according to
> > NUMA topology, so that it traverses cpus per-hop.
> > 
> > If the CPU is the last cpu in a given hop, sched_numa_find_next_cpu()
> > switches to the next hop, and picks the first CPU from there, excluding
> > those already traversed.
> 
> ...
> 
> > +/*
> 
> Hmm... Is it deliberately not a kernel doc?

Yes, I'd prefer to encourage people to use for_each() approach instead
of calling it directly.

If there will be a good reason to make it a more self-consistent thing,
we'll have to add a wrapper, just like sched_numa_find_nth_cpu() is
wrapped with cpumask_local_spread(). Particularly, use RCU lock/unlock
and properly handle NUMA_NO_NODE.
 
> > + * sched_numa_find_next_cpu() - given the NUMA topology, find the next cpu
> > + * cpumask: cpumask to find a cpu from
> > + * cpu: current cpu
> > + * node: local node
> > + * hop: (in/out) indicates distance order of current CPU to a local node
> > + *
> > + * The function searches for next cpu at a given NUMA distance, indicated
> > + * by hop, and if nothing found, tries to find CPUs at a greater distance,
> > + * starting from the beginning.
> > + *
> > + * returns: cpu, or >= nr_cpu_ids when nothing found.
> > + */
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro
  2023-03-25 18:55 ` [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
@ 2023-04-10 18:05   ` Yury Norov
  0 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-04-10 18:05 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

On Sat, Mar 25, 2023 at 11:55:09AM -0700, Yury Norov wrote:
> for_each_cpu() is widely used in the kernel, and it's beneficial to
> create a NUMA-aware version of the macro.
> 
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
> 
> New for_each_numa_cpu() is designed to be similar to the for_each_cpu().
> It allows to convert existing code to NUMA-aware as simple as adding a
> hop iterator variable and passing it inside new macro. for_each_numa_cpu()
> takes care of the rest.
> 
> At the moment, we have 2 users of NUMA-aware enumerators. One is
> Melanox's in-tree driver, and another is Intel's in-review driver:
> 
> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/
> 
> Both real-life examples follow the same pattern:
> 
> 	for_each_numa_hop_mask(cpus, prev, node) {
>  		for_each_cpu_andnot(cpu, cpus, prev) {
>  			if (cnt++ == max_num)
>  				goto out;
>  			do_something(cpu);
>  		}
> 		prev = cpus;
>  	}
> 
> With the new macro, it would look like this:
> 
> 	for_each_numa_cpu(cpu, hop, node, cpu_possible_mask) {
> 		if (cnt++ == max_num)
> 			break;
> 		do_something(cpu);
>  	}
> 
> Straight conversion of existing for_each_cpu() codebase to NUMA-aware
> version with for_each_numa_hop_mask() is difficult because it doesn't
> take a user-provided cpu mask, and eventually ends up with open-coded
> double loop. With for_each_numa_cpu() it shouldn't be a brainteaser.
> Consider the NUMA-ignorant example:
> 
> 	cpumask_t cpus = get_mask();
> 	int cnt = 0, cpu;
> 
> 	for_each_cpu(cpu, cpus) {
> 		if (cnt++ == max_num)
> 			break;
> 		do_something(cpu);
>  	}
> 
> Converting it to NUMA-aware version would be as simple as:
> 
> 	cpumask_t cpus = get_mask();
> 	int node = get_node();
> 	int cnt = 0, hop, cpu;
> 
> 	for_each_numa_cpu(cpu, hop, node, cpus) {
> 		if (cnt++ == max_num)
> 			break;
> 		do_something(cpu);
>  	}
> 
> The latter looks more verbose and avoids from open-coding that annoying
> double loop. Another advantage is that it works with a 'hop' parameter with
> the clear meaning of NUMA distance, and doesn't make people not familiar
> to enumerator internals bothering with current and previous masks machinery.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  include/linux/topology.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4a63154fa036..62a9dd8edd77 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -286,4 +286,24 @@ sched_numa_hop_mask(unsigned int node, unsigned int hops)
>  	     !IS_ERR_OR_NULL(mask);					       \
>  	     __hops++)
>  
> +/**
> + * for_each_numa_cpu - iterate over cpus in increasing order taking into account
> + *		       NUMA distances from a given node.
> + * @cpu: the (optionally unsigned) integer iterator
> + * @hop: the iterator variable, must be initialized to a desired minimal hop.
> + * @node: the NUMA node to start the search from.
> + *
> + * Requires rcu_lock to be held.

The comments below are incorrect (copy-paste error). I'll remove them in v2.

> + *
> + * Because it's implemented as double-loop, using 'break' inside the body of
> + * iterator may lead to undefined behaviour. Use 'goto' instead.
> + *
> + * Yields intersection of @mask and cpu_online_mask if @node == NUMA_NO_NODE.
> + */
> +#define for_each_numa_cpu(cpu, hop, node, mask)					\
> +	for ((cpu) = 0, (hop) = 0;						\
> +		(cpu) = sched_numa_find_next_cpu((mask), (cpu), (node), &(hop)),\
> +		(cpu) < nr_cpu_ids;						\
> +		(cpu)++)
> +
>  #endif /* _LINUX_TOPOLOGY_H */
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro
  2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
                   ` (7 preceding siblings ...)
  2023-03-25 18:55 ` [RFC PATCH 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
@ 2023-04-10 18:09 ` Yury Norov
  8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2023-04-10 18:09 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Pawel Chmielewski, Leon Romanovsky,
	David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Tariq Toukan, Gal Pressman, Greg Kroah-Hartman, Heiko Carstens,
	Barry Song

On Sat, Mar 25, 2023 at 11:55:06AM -0700, Yury Norov wrote:
> for_each_cpu() is widely used in kernel, and it's beneficial to create
> a NUMA-aware version of the macro.
> 
> Recently added for_each_numa_hop_mask() works, but switching existing
> codebase to it is not an easy process.
> 
> This series adds for_each_numa_cpu(), which is designed to be similar to
> the for_each_cpu(). It allows to convert existing code to NUMA-aware as
> simple as adding a hop iterator variable and passing it inside new macro.
> for_each_numa_cpu() takes care of the rest.
> 
> At the moment, we have 2 users of NUMA-aware enumerators. One is
> Melanox's in-tree driver, and another is Intel's in-review driver:
> 
> https://lore.kernel.org/lkml/20230216145455.661709-1-pawel.chmielewski@intel.com/

Are there any more comments to the series? If not, I'll address those
shared by Andy and send v2.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-10 18:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-25 18:55 [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-03-25 18:55 ` [PATCH 1/8] lib/find: add find_next_and_andnot_bit() Yury Norov
2023-03-27 10:26   ` Andy Shevchenko
2023-03-25 18:55 ` [PATCH 2/8] sched/topology: introduce sched_numa_find_next_cpu() Yury Norov
2023-03-27 10:28   ` Andy Shevchenko
2023-04-10 18:00     ` Yury Norov
2023-03-25 18:55 ` [PATCH 3/8] sched/topology: add for_each_numa_cpu() macro Yury Norov
2023-04-10 18:05   ` Yury Norov
2023-03-25 18:55 ` [PATCH 4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu Yury Norov
2023-03-25 18:55 ` [PATCH 5/8] lib/cpumask: update comment to cpumask_local_spread() Yury Norov
2023-03-25 18:55 ` [PATCH 6/8] sched/topology: export sched_domains_numa_levels Yury Norov
2023-03-25 18:55 ` [PATCH 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Yury Norov
2023-03-25 18:55 ` [RFC PATCH 8/8] sched: drop for_each_numa_hop_mask() Yury Norov
2023-04-10 18:09 ` [PATCH 0/8] sched/topology: add for_each_numa_cpu() macro Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).