linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1)
@ 2023-12-12  4:21 Yury Norov
  2023-12-12  4:21 ` [PATCH v3 1/7] cpumask: introduce for_each_cpu_and_from() Yury Norov
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

grp_spread_init_one() implementation is sub-optimal because it
traverses bitmaps from the beginning, instead of picking from the
previous iteration.

Fix it and use find_bit API where appropriate. While here, optimize
cpumasks allocation and drop unneeded cpumask_empty() call.

---
v1: https://lore.kernel.org/all/ZW5MI3rKQueLM0Bz@yury-ThinkPad/T/
v2: https://lore.kernel.org/lkml/ZXKNVRu3AfvjaFhK@fedora/T/
v3:
 - swap patches #2 and #3 @ Ming Lei;
 - add patch #7, which simplifies the function further.


Yury Norov (7):
  cpumask: introduce for_each_cpu_and_from()
  lib/group_cpus: optimize inner loop in grp_spread_init_one()
  lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  lib/group_cpus: optimize outer loop in grp_spread_init_one()
  lib/cgroup_cpus.c: don't zero cpumasks in group_cpus_evenly() on
    allocation
  lib/group_cpus.c: drop unneeded cpumask_empty() call in
    __group_cpus_evenly()
  lib/group_cpus: simplify grp_spread_init_one() for more

 include/linux/cpumask.h | 11 ++++++++++
 include/linux/find.h    |  3 +++
 lib/group_cpus.c        | 47 +++++++++++++++++------------------------
 3 files changed, 33 insertions(+), 28 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/7] cpumask: introduce for_each_cpu_and_from()
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-12  4:21 ` [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(),
which is handy when it's needed to traverse 2 cpumasks or bitmaps,
starting from a given position.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/cpumask.h | 11 +++++++++++
 include/linux/find.h    |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..73ff2e0ef090 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -332,6 +332,17 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
 #define for_each_cpu_and(cpu, mask1, mask2)				\
 	for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
 
+/**
+ * for_each_cpu_and_from - iterate over every cpu in both masks starting from a given cpu
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_and_from(cpu, mask1, mask2)				\
+	for_each_and_bit_from(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
+
 /**
  * for_each_cpu_andnot - iterate over every cpu present in one mask, excluding
  *			 those present in another.
diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..dfd3d51ff590 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -563,6 +563,9 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_and_bit_from(bit, addr1, addr2, size) \
+	for (; (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size); (bit)++)
+
 #define for_each_andnot_bit(bit, addr1, addr2, size) \
 	for ((bit) = 0;									\
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-12-12  4:21 ` [PATCH v3 1/7] cpumask: introduce for_each_cpu_and_from() Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-12  9:46   ` Ming Lei
  2023-12-12  4:21 ` [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement " Yury Norov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The loop starts from the beginning every time we switch to the next
sibling mask. This is the Schlemiel the Painter's style of coding
because we know for sure that nmsk is clear up to current CPU, and we
can just continue from the next CPU.

Also, we can do it nicer if leverage the dedicated for_each() iterator,
and simplify the logic of clearing a bit in nmsk.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..10dead3ab0e0 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		for (sibl = -1; cpus_per_grp > 0; ) {
-			sibl = cpumask_next(sibl, siblmsk);
-			if (sibl >= nr_cpu_ids)
-				break;
-			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
-				continue;
+		sibl = cpu + 1;
+
+		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
+			cpumask_clear_cpu(sibl, nmsk);
 			cpumask_set_cpu(sibl, irqmsk);
-			cpus_per_grp--;
+			if (cpus_per_grp-- == 0)
+				return;
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
  2023-12-12  4:21 ` [PATCH v3 1/7] cpumask: introduce for_each_cpu_and_from() Yury Norov
  2023-12-12  4:21 ` [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-12  9:50   ` Ming Lei
  2023-12-12  4:21 ` [PATCH v3 4/7] lib/group_cpus: optimize outer loop " Yury Norov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Because nmsk and irqmsk are stable, extra atomicity is not required.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 10dead3ab0e0..7ac94664230f 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 		if (cpu >= nr_cpu_ids)
 			return;
 
-		cpumask_clear_cpu(cpu, nmsk);
-		cpumask_set_cpu(cpu, irqmsk);
+		__cpumask_clear_cpu(cpu, nmsk);
+		__cpumask_set_cpu(cpu, irqmsk);
 		cpus_per_grp--;
 
 		/* If the cpu has siblings, use them first */
@@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 		sibl = cpu + 1;
 
 		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
-			cpumask_clear_cpu(sibl, nmsk);
-			cpumask_set_cpu(sibl, irqmsk);
+			__cpumask_clear_cpu(sibl, nmsk);
+			__cpumask_set_cpu(sibl, irqmsk);
 			if (cpus_per_grp-- == 0)
 				return;
 		}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/7] lib/group_cpus: optimize outer loop in grp_spread_init_one()
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (2 preceding siblings ...)
  2023-12-12  4:21 ` [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement " Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-12  4:21 ` [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

Similarly to the inner loop, in the outer loop we can use for_each_cpu()
macro, and skip CPUs that have been copied.

With this patch, the function becomes O(1), despite that it's a
double-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 7ac94664230f..cded3c8ea63b 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -17,16 +17,11 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 	const struct cpumask *siblmsk;
 	int cpu, sibl;
 
-	for ( ; cpus_per_grp > 0; ) {
-		cpu = cpumask_first(nmsk);
-
-		/* Should not happen, but I'm too lazy to think about it */
-		if (cpu >= nr_cpu_ids)
-			return;
-
+	for_each_cpu(cpu, nmsk) {
 		__cpumask_clear_cpu(cpu, nmsk);
 		__cpumask_set_cpu(cpu, irqmsk);
-		cpus_per_grp--;
+		if (cpus_per_grp-- == 0)
+			return;
 
 		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
@@ -37,6 +32,7 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 			__cpumask_set_cpu(sibl, irqmsk);
 			if (cpus_per_grp-- == 0)
 				return;
+			cpu = sibl + 1;
 		}
 	}
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (3 preceding siblings ...)
  2023-12-12  4:21 ` [PATCH v3 4/7] lib/group_cpus: optimize outer loop " Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-13  0:56   ` Ming Lei
  2023-12-12  4:21 ` [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
  2023-12-12  4:21 ` [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more Yury Norov
  6 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

nmsk and npresmsk are both allocated with zalloc_cpumask_var(), but they
are initialized by copying later in the code, and so may be allocated
uninitialized.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index cded3c8ea63b..c7fcd04c87bf 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -347,10 +347,10 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 	int ret = -ENOMEM;
 	struct cpumask *masks = NULL;
 
-	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
+	if (!alloc_cpumask_var(&nmsk, GFP_KERNEL))
 		return NULL;
 
-	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
+	if (!alloc_cpumask_var(&npresmsk, GFP_KERNEL))
 		goto fail_nmsk;
 
 	node_to_cpumask = alloc_node_to_cpumask();
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly()
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (4 preceding siblings ...)
  2023-12-12  4:21 ` [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-13  0:59   ` Ming Lei
  2023-12-12  4:21 ` [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more Yury Norov
  6 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The function is called twice. First time it's called with
cpumask_present as a parameter, which can't be empty. Second time it's
called with a mask created with cpumask_andnot(), which returns false if
the result is an empty mask.

We can safely drop redundand cpumask_empty() call from the
__group_cpus_evenly() and save few cycles.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index c7fcd04c87bf..664a56171a1b 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -252,9 +252,6 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
 	nodemask_t nodemsk = NODE_MASK_NONE;
 	struct node_groups *node_groups;
 
-	if (cpumask_empty(cpu_mask))
-		return 0;
-
 	nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
 
 	/*
@@ -394,9 +391,14 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
 		curgrp = 0;
 	else
 		curgrp = nr_present;
-	cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk);
-	ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
-				  npresmsk, nmsk, masks);
+
+	if (cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk))
+		/* If npresmsk is not empty */
+		ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
+					  npresmsk, nmsk, masks);
+	else
+		ret = 0;
+
 	if (ret >= 0)
 		nr_others = ret;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more
  2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
                   ` (5 preceding siblings ...)
  2023-12-12  4:21 ` [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
@ 2023-12-12  4:21 ` Yury Norov
  2023-12-13  1:06   ` Ming Lei
  6 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12  4:21 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner, Ming Lei, linux-kernel
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes

The outer and inner loops of grp_spread_init_one() do the same thing -
move a bit from nmsk to irqmsk.

The inner loop iterates the sibling group, which includes the CPU picked
by outer loop. And it means that we can drop the part that moves the bit
in the outer loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/group_cpus.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index 664a56171a1b..7aa7a6289355 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -18,14 +18,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
 	int cpu, sibl;
 
 	for_each_cpu(cpu, nmsk) {
-		__cpumask_clear_cpu(cpu, nmsk);
-		__cpumask_set_cpu(cpu, irqmsk);
-		if (cpus_per_grp-- == 0)
-			return;
-
-		/* If the cpu has siblings, use them first */
 		siblmsk = topology_sibling_cpumask(cpu);
-		sibl = cpu + 1;
+		sibl = cpu;
 
 		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
 			__cpumask_clear_cpu(sibl, nmsk);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-12  4:21 ` [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
@ 2023-12-12  9:46   ` Ming Lei
  2023-12-12 17:04     ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2023-12-12  9:46 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes, ming.lei

On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> The loop starts from the beginning every time we switch to the next
> sibling mask. This is the Schlemiel the Painter's style of coding
> because we know for sure that nmsk is clear up to current CPU, and we
> can just continue from the next CPU.
> 
> Also, we can do it nicer if leverage the dedicated for_each() iterator,
> and simplify the logic of clearing a bit in nmsk.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc..10dead3ab0e0 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  
>  		/* If the cpu has siblings, use them first */
>  		siblmsk = topology_sibling_cpumask(cpu);
> -		for (sibl = -1; cpus_per_grp > 0; ) {
> -			sibl = cpumask_next(sibl, siblmsk);
> -			if (sibl >= nr_cpu_ids)
> -				break;
> -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> -				continue;
> +		sibl = cpu + 1;

It doesn't have to 'cpu + 1', see below comment.

> +
> +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> +			cpumask_clear_cpu(sibl, nmsk);
>  			cpumask_set_cpu(sibl, irqmsk);
> -			cpus_per_grp--;
> +			if (cpus_per_grp-- == 0)

			if (--cpus_per_grp == 0)

> +				return;

I think for_each_cpu_and() should work just fine, cause cpu has been cleared
from nmsk, so the change can be something like, then patch 1 isn't
necessary.


		for_each_cpu_and(sibl, siblmsk, nmsk) {
			cpumask_clear_cpu(sibl, nmsk);
  			cpumask_set_cpu(sibl, irqmsk);
			if (--cpus_per_grp == 0)
				return;
		}


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-12  4:21 ` [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement " Yury Norov
@ 2023-12-12  9:50   ` Ming Lei
  2023-12-12 16:52     ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2023-12-12  9:50 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Mon, Dec 11, 2023 at 08:21:03PM -0800, Yury Norov wrote:
> Because nmsk and irqmsk are stable, extra atomicity is not required.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 10dead3ab0e0..7ac94664230f 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  		if (cpu >= nr_cpu_ids)
>  			return;
>  
> -		cpumask_clear_cpu(cpu, nmsk);
> -		cpumask_set_cpu(cpu, irqmsk);
> +		__cpumask_clear_cpu(cpu, nmsk);
> +		__cpumask_set_cpu(cpu, irqmsk);
>  		cpus_per_grp--;
>  
>  		/* If the cpu has siblings, use them first */
> @@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  		sibl = cpu + 1;
>  
>  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> -			cpumask_clear_cpu(sibl, nmsk);
> -			cpumask_set_cpu(sibl, irqmsk);
> +			__cpumask_clear_cpu(sibl, nmsk);
> +			__cpumask_set_cpu(sibl, irqmsk);

I think this kind of change should be avoided, here the code is
absolutely in slow path, and we care code cleanness and readability
much more than the saved cycle from non atomicity.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-12  9:50   ` Ming Lei
@ 2023-12-12 16:52     ` Yury Norov
  2023-12-13  0:14       ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12 16:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Tue, Dec 12, 2023 at 05:50:04PM +0800, Ming Lei wrote:
> On Mon, Dec 11, 2023 at 08:21:03PM -0800, Yury Norov wrote:
> > Because nmsk and irqmsk are stable, extra atomicity is not required.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  lib/group_cpus.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index 10dead3ab0e0..7ac94664230f 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >  		if (cpu >= nr_cpu_ids)
> >  			return;
> >  
> > -		cpumask_clear_cpu(cpu, nmsk);
> > -		cpumask_set_cpu(cpu, irqmsk);
> > +		__cpumask_clear_cpu(cpu, nmsk);
> > +		__cpumask_set_cpu(cpu, irqmsk);
> >  		cpus_per_grp--;
> >  
> >  		/* If the cpu has siblings, use them first */
> > @@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >  		sibl = cpu + 1;
> >  
> >  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > -			cpumask_clear_cpu(sibl, nmsk);
> > -			cpumask_set_cpu(sibl, irqmsk);
> > +			__cpumask_clear_cpu(sibl, nmsk);
> > +			__cpumask_set_cpu(sibl, irqmsk);
> 
> I think this kind of change should be avoided, here the code is
> absolutely in slow path, and we care code cleanness and readability
> much more than the saved cycle from non atomicity.

Atomic ops have special meaning and special function. This 'atomic' way
of moving a bit from one bitmap to another looks completely non-trivial
and puzzling to me.

A sequence of atomic ops is not atomic itself. Normally it's a sing of
a bug. But in this case, both masks are stable, and we don't need
atomicity at all.

It's not about performance, it's about readability.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-12  9:46   ` Ming Lei
@ 2023-12-12 17:04     ` Yury Norov
  2023-12-13  0:06       ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-12 17:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > The loop starts from the beginning every time we switch to the next
> > sibling mask. This is the Schlemiel the Painter's style of coding
> > because we know for sure that nmsk is clear up to current CPU, and we
> > can just continue from the next CPU.
> > 
> > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > and simplify the logic of clearing a bit in nmsk.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  lib/group_cpus.c | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index ee272c4cefcc..10dead3ab0e0 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >  
> >  		/* If the cpu has siblings, use them first */
> >  		siblmsk = topology_sibling_cpumask(cpu);
> > -		for (sibl = -1; cpus_per_grp > 0; ) {
> > -			sibl = cpumask_next(sibl, siblmsk);
> > -			if (sibl >= nr_cpu_ids)
> > -				break;
> > -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > -				continue;
> > +		sibl = cpu + 1;
> 
> It doesn't have to 'cpu + 1', see below comment.
> 
> > +
> > +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > +			cpumask_clear_cpu(sibl, nmsk);
> >  			cpumask_set_cpu(sibl, irqmsk);
> > -			cpus_per_grp--;
> > +			if (cpus_per_grp-- == 0)
> 
> 			if (--cpus_per_grp == 0)
 
That's right, I'll send a new version this weekend.

> > +				return;
> 
> I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> from nmsk, so the change can be something like, then patch 1 isn't
> necessary.
 
It works just fine except that it's O(N^2), where O(N) is easily
achievable. Again, it's not about performance, it's about coding
habits.
 
> 		for_each_cpu_and(sibl, siblmsk, nmsk) {
> 			cpumask_clear_cpu(sibl, nmsk);
>   			cpumask_set_cpu(sibl, irqmsk);
> 			if (--cpus_per_grp == 0)
> 				return;
> 		}
> 
> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-12 17:04     ` Yury Norov
@ 2023-12-13  0:06       ` Ming Lei
  2023-12-25 17:38         ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2023-12-13  0:06 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes, ming.lei

On Tue, Dec 12, 2023 at 09:04:19AM -0800, Yury Norov wrote:
> On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> > On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > > The loop starts from the beginning every time we switch to the next
> > > sibling mask. This is the Schlemiel the Painter's style of coding
> > > because we know for sure that nmsk is clear up to current CPU, and we
> > > can just continue from the next CPU.
> > > 
> > > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > > and simplify the logic of clearing a bit in nmsk.
> > > 
> > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > ---
> > >  lib/group_cpus.c | 13 ++++++-------
> > >  1 file changed, 6 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > index ee272c4cefcc..10dead3ab0e0 100644
> > > --- a/lib/group_cpus.c
> > > +++ b/lib/group_cpus.c
> > > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > >  
> > >  		/* If the cpu has siblings, use them first */
> > >  		siblmsk = topology_sibling_cpumask(cpu);
> > > -		for (sibl = -1; cpus_per_grp > 0; ) {
> > > -			sibl = cpumask_next(sibl, siblmsk);
> > > -			if (sibl >= nr_cpu_ids)
> > > -				break;
> > > -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > > -				continue;
> > > +		sibl = cpu + 1;
> > 
> > It doesn't have to 'cpu + 1', see below comment.
> > 
> > > +
> > > +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > +			cpumask_clear_cpu(sibl, nmsk);
> > >  			cpumask_set_cpu(sibl, irqmsk);
> > > -			cpus_per_grp--;
> > > +			if (cpus_per_grp-- == 0)
> > 
> > 			if (--cpus_per_grp == 0)
>  
> That's right, I'll send a new version this weekend.
> 
> > > +				return;
> > 
> > I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> > from nmsk, so the change can be something like, then patch 1 isn't
> > necessary.
>  
> It works just fine except that it's O(N^2), where O(N) is easily
> achievable. Again, it's not about performance, it's about coding
> habits.

Both for_each_cpu_and() and for_each_cpu_and_from() are O(N), aren't
they? Given both two are based on find_next_and_bit().

for_each_cpu_and() is simpler and more readable, and more
importantly, we can save one single-user public helper.

>  
> > 		for_each_cpu_and(sibl, siblmsk, nmsk) {
> > 			cpumask_clear_cpu(sibl, nmsk);
> >   			cpumask_set_cpu(sibl, irqmsk);
> > 			if (--cpus_per_grp == 0)
> > 				return;
> > 		}


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-12 16:52     ` Yury Norov
@ 2023-12-13  0:14       ` Ming Lei
  2023-12-13 17:03         ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2023-12-13  0:14 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Tue, Dec 12, 2023 at 08:52:14AM -0800, Yury Norov wrote:
> On Tue, Dec 12, 2023 at 05:50:04PM +0800, Ming Lei wrote:
> > On Mon, Dec 11, 2023 at 08:21:03PM -0800, Yury Norov wrote:
> > > Because nmsk and irqmsk are stable, extra atomicity is not required.
> > > 
> > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > ---
> > >  lib/group_cpus.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > index 10dead3ab0e0..7ac94664230f 100644
> > > --- a/lib/group_cpus.c
> > > +++ b/lib/group_cpus.c
> > > @@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > >  		if (cpu >= nr_cpu_ids)
> > >  			return;
> > >  
> > > -		cpumask_clear_cpu(cpu, nmsk);
> > > -		cpumask_set_cpu(cpu, irqmsk);
> > > +		__cpumask_clear_cpu(cpu, nmsk);
> > > +		__cpumask_set_cpu(cpu, irqmsk);
> > >  		cpus_per_grp--;
> > >  
> > >  		/* If the cpu has siblings, use them first */
> > > @@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > >  		sibl = cpu + 1;
> > >  
> > >  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > -			cpumask_clear_cpu(sibl, nmsk);
> > > -			cpumask_set_cpu(sibl, irqmsk);
> > > +			__cpumask_clear_cpu(sibl, nmsk);
> > > +			__cpumask_set_cpu(sibl, irqmsk);
> > 
> > I think this kind of change should be avoided, here the code is
> > absolutely in slow path, and we care code cleanness and readability
> > much more than the saved cycle from non atomicity.
> 
> Atomic ops have special meaning and special function. This 'atomic' way
> of moving a bit from one bitmap to another looks completely non-trivial
> and puzzling to me.
> 
> A sequence of atomic ops is not atomic itself. Normally it's a sing of
> a bug. But in this case, both masks are stable, and we don't need
> atomicity at all.

Here we don't care the atomicity.

> 
> It's not about performance, it's about readability.

__cpumask_clear_cpu() and __cpumask_set_cpu() are more like private
helper, and more hard to follow.

[@linux]$ git grep -n -w -E "cpumask_clear_cpu|cpumask_set_cpu" ./ | wc
    674    2055   53954
[@linux]$ git grep -n -w -E "__cpumask_clear_cpu|__cpumask_set_cpu" ./ | wc
     21      74    1580

I don't object to comment the current usage, but NAK for this change.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation
  2023-12-12  4:21 ` [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
@ 2023-12-13  0:56   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2023-12-13  0:56 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Mon, Dec 11, 2023 at 08:21:05PM -0800, Yury Norov wrote:
> nmsk and npresmsk are both allocated with zalloc_cpumask_var(), but they
> are initialized by copying later in the code, and so may be allocated
> uninitialized.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index cded3c8ea63b..c7fcd04c87bf 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -347,10 +347,10 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>  	int ret = -ENOMEM;
>  	struct cpumask *masks = NULL;
>  
> -	if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL))
> +	if (!alloc_cpumask_var(&nmsk, GFP_KERNEL))
>  		return NULL;

`nmsk` is actually used by __group_cpus_evenly() only, and it should be
local variable of __group_cpus_evenly(), can you move its allocation into
__group_cpus_evenly()?

>  
> -	if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL))
> +	if (!alloc_cpumask_var(&npresmsk, GFP_KERNEL))
>  		goto fail_nmsk;

The above one looks fine, especially `npresmsk` is initialized in
group_cpus_evenly() explicitly.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly()
  2023-12-12  4:21 ` [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
@ 2023-12-13  0:59   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2023-12-13  0:59 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Mon, Dec 11, 2023 at 08:21:06PM -0800, Yury Norov wrote:
> The function is called twice. First time it's called with
> cpumask_present as a parameter, which can't be empty. Second time it's
> called with a mask created with cpumask_andnot(), which returns false if
> the result is an empty mask.
> 
> We can safely drop redundand cpumask_empty() call from the
> __group_cpus_evenly() and save few cycles.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index c7fcd04c87bf..664a56171a1b 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -252,9 +252,6 @@ static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps,
>  	nodemask_t nodemsk = NODE_MASK_NONE;
>  	struct node_groups *node_groups;
>  
> -	if (cpumask_empty(cpu_mask))
> -		return 0;
> -
>  	nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk);
>  
>  	/*
> @@ -394,9 +391,14 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>  		curgrp = 0;
>  	else
>  		curgrp = nr_present;
> -	cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk);
> -	ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
> -				  npresmsk, nmsk, masks);
> +
> +	if (cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk))
> +		/* If npresmsk is not empty */
> +		ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
> +					  npresmsk, nmsk, masks);
> +	else
> +		ret = 0;
> +
>  	if (ret >= 0)
>  		nr_others = ret;

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more
  2023-12-12  4:21 ` [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more Yury Norov
@ 2023-12-13  1:06   ` Ming Lei
  2023-12-25 18:03     ` Yury Norov
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2023-12-13  1:06 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Mon, Dec 11, 2023 at 08:21:07PM -0800, Yury Norov wrote:
> The outer and inner loops of grp_spread_init_one() do the same thing -
> move a bit from nmsk to irqmsk.
> 
> The inner loop iterates the sibling group, which includes the CPU picked
> by outer loop. And it means that we can drop the part that moves the bit
> in the outer loop.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/group_cpus.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 664a56171a1b..7aa7a6289355 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -18,14 +18,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>  	int cpu, sibl;
>  
>  	for_each_cpu(cpu, nmsk) {
> -		__cpumask_clear_cpu(cpu, nmsk);
> -		__cpumask_set_cpu(cpu, irqmsk);
> -		if (cpus_per_grp-- == 0)
> -			return;
> -
> -		/* If the cpu has siblings, use them first */
>  		siblmsk = topology_sibling_cpumask(cpu);
> -		sibl = cpu + 1;
> +		sibl = cpu;
>  
>  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
>  			__cpumask_clear_cpu(sibl, nmsk);

Correctness of the above change requires that 'cpu' has to be included
into topology_sibling_cpumask(cpu), however, not sure it is always true,
see the following comment in Documentation/arch/x86/topology.rst

`
  - topology_sibling_cpumask():

    The cpumask contains all online threads in the core to which a thread
    belongs.
`

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-13  0:14       ` Ming Lei
@ 2023-12-13 17:03         ` Yury Norov
  2023-12-14  0:43           ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Yury Norov @ 2023-12-13 17:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Wed, Dec 13, 2023 at 08:14:45AM +0800, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 08:52:14AM -0800, Yury Norov wrote:
> > On Tue, Dec 12, 2023 at 05:50:04PM +0800, Ming Lei wrote:
> > > On Mon, Dec 11, 2023 at 08:21:03PM -0800, Yury Norov wrote:
> > > > Because nmsk and irqmsk are stable, extra atomicity is not required.
> > > > 
> > > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > > ---
> > > >  lib/group_cpus.c | 8 ++++----
> > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > > index 10dead3ab0e0..7ac94664230f 100644
> > > > --- a/lib/group_cpus.c
> > > > +++ b/lib/group_cpus.c
> > > > @@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > >  		if (cpu >= nr_cpu_ids)
> > > >  			return;
> > > >  
> > > > -		cpumask_clear_cpu(cpu, nmsk);
> > > > -		cpumask_set_cpu(cpu, irqmsk);
> > > > +		__cpumask_clear_cpu(cpu, nmsk);
> > > > +		__cpumask_set_cpu(cpu, irqmsk);
> > > >  		cpus_per_grp--;
> > > >  
> > > >  		/* If the cpu has siblings, use them first */
> > > > @@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > >  		sibl = cpu + 1;
> > > >  
> > > >  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > > -			cpumask_clear_cpu(sibl, nmsk);
> > > > -			cpumask_set_cpu(sibl, irqmsk);
> > > > +			__cpumask_clear_cpu(sibl, nmsk);
> > > > +			__cpumask_set_cpu(sibl, irqmsk);
> > > 
> > > I think this kind of change should be avoided, here the code is
> > > absolutely in slow path, and we care code cleanness and readability
> > > much more than the saved cycle from non atomicity.
> > 
> > Atomic ops have special meaning and special function. This 'atomic' way
> > of moving a bit from one bitmap to another looks completely non-trivial
> > and puzzling to me.
> > 
> > A sequence of atomic ops is not atomic itself. Normally it's a sing of
> > a bug. But in this case, both masks are stable, and we don't need
> > atomicity at all.
> 
> Here we don't care the atomicity.
> 
> > 
> > It's not about performance, it's about readability.
> 
> __cpumask_clear_cpu() and __cpumask_set_cpu() are more like private
> helper, and more hard to follow.

No that's not true. Non-atomic version of the function is not a
private helper of course.
 
> [@linux]$ git grep -n -w -E "cpumask_clear_cpu|cpumask_set_cpu" ./ | wc
>     674    2055   53954
> [@linux]$ git grep -n -w -E "__cpumask_clear_cpu|__cpumask_set_cpu" ./ | wc
>      21      74    1580
> 
> I don't object to comment the current usage, but NAK for this change.

No problem, I'll add you NAK.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement in grp_spread_init_one()
  2023-12-13 17:03         ` Yury Norov
@ 2023-12-14  0:43           ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2023-12-14  0:43 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Wed, Dec 13, 2023 at 09:03:17AM -0800, Yury Norov wrote:
> On Wed, Dec 13, 2023 at 08:14:45AM +0800, Ming Lei wrote:
> > On Tue, Dec 12, 2023 at 08:52:14AM -0800, Yury Norov wrote:
> > > On Tue, Dec 12, 2023 at 05:50:04PM +0800, Ming Lei wrote:
> > > > On Mon, Dec 11, 2023 at 08:21:03PM -0800, Yury Norov wrote:
> > > > > Because nmsk and irqmsk are stable, extra atomicity is not required.
> > > > > 
> > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > > > ---
> > > > >  lib/group_cpus.c | 8 ++++----
> > > > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > > > index 10dead3ab0e0..7ac94664230f 100644
> > > > > --- a/lib/group_cpus.c
> > > > > +++ b/lib/group_cpus.c
> > > > > @@ -24,8 +24,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > > >  		if (cpu >= nr_cpu_ids)
> > > > >  			return;
> > > > >  
> > > > > -		cpumask_clear_cpu(cpu, nmsk);
> > > > > -		cpumask_set_cpu(cpu, irqmsk);
> > > > > +		__cpumask_clear_cpu(cpu, nmsk);
> > > > > +		__cpumask_set_cpu(cpu, irqmsk);
> > > > >  		cpus_per_grp--;
> > > > >  
> > > > >  		/* If the cpu has siblings, use them first */
> > > > > @@ -33,8 +33,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > > >  		sibl = cpu + 1;
> > > > >  
> > > > >  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > > > -			cpumask_clear_cpu(sibl, nmsk);
> > > > > -			cpumask_set_cpu(sibl, irqmsk);
> > > > > +			__cpumask_clear_cpu(sibl, nmsk);
> > > > > +			__cpumask_set_cpu(sibl, irqmsk);
> > > > 
> > > > I think this kind of change should be avoided, here the code is
> > > > absolutely in slow path, and we care code cleanness and readability
> > > > much more than the saved cycle from non atomicity.
> > > 
> > > Atomic ops have special meaning and special function. This 'atomic' way
> > > of moving a bit from one bitmap to another looks completely non-trivial
> > > and puzzling to me.
> > > 
> > > A sequence of atomic ops is not atomic itself. Normally it's a sing of
> > > a bug. But in this case, both masks are stable, and we don't need
> > > atomicity at all.
> > 
> > Here we don't care the atomicity.
> > 
> > > 
> > > It's not about performance, it's about readability.
> > 
> > __cpumask_clear_cpu() and __cpumask_set_cpu() are more like private
> > helper, and more hard to follow.
> 
> No that's not true. Non-atomic version of the function is not a
> private helper of course.
>  
> > [@linux]$ git grep -n -w -E "cpumask_clear_cpu|cpumask_set_cpu" ./ | wc
> >     674    2055   53954
> > [@linux]$ git grep -n -w -E "__cpumask_clear_cpu|__cpumask_set_cpu" ./ | wc
> >      21      74    1580
> > 
> > I don't object to comment the current usage, but NAK for this change.
> 
> No problem, I'll add you NAK.

You can add the following words meantime:

__cpumask_clear_cpu() and __cpumask_set_cpu() are added in commit 6c8557bdb28d
("smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu()") for fast code path(
smp_call_function_many()).

We have ~670 users of cpumask_clear_cpu & cpumask_set_cpu, lots of them
fall into same category with group_cpus.c(doesn't care atomicity, not in fast
code path), and needn't change to __cpumask_clear_cpu() and __cpumask_set_cpu().
Otherwise, this way may encourage to update others into the __cpumask_* version.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()
  2023-12-13  0:06       ` Ming Lei
@ 2023-12-25 17:38         ` Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-12-25 17:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Wed, Dec 13, 2023 at 08:06:48AM +0800, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 09:04:19AM -0800, Yury Norov wrote:
> > On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> > > On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > > > The loop starts from the beginning every time we switch to the next
> > > > sibling mask. This is the Schlemiel the Painter's style of coding
> > > > because we know for sure that nmsk is clear up to current CPU, and we
> > > > can just continue from the next CPU.
> > > > 
> > > > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > > > and simplify the logic of clearing a bit in nmsk.
> > > > 
> > > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > > ---
> > > >  lib/group_cpus.c | 13 ++++++-------
> > > >  1 file changed, 6 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > > index ee272c4cefcc..10dead3ab0e0 100644
> > > > --- a/lib/group_cpus.c
> > > > +++ b/lib/group_cpus.c
> > > > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > >  
> > > >  		/* If the cpu has siblings, use them first */
> > > >  		siblmsk = topology_sibling_cpumask(cpu);
> > > > -		for (sibl = -1; cpus_per_grp > 0; ) {
> > > > -			sibl = cpumask_next(sibl, siblmsk);
> > > > -			if (sibl >= nr_cpu_ids)
> > > > -				break;
> > > > -			if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > > > -				continue;
> > > > +		sibl = cpu + 1;
> > > 
> > > It doesn't have to 'cpu + 1', see below comment.
> > > 
> > > > +
> > > > +		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > > +			cpumask_clear_cpu(sibl, nmsk);
> > > >  			cpumask_set_cpu(sibl, irqmsk);
> > > > -			cpus_per_grp--;
> > > > +			if (cpus_per_grp-- == 0)
> > > 
> > > 			if (--cpus_per_grp == 0)
> >  
> > That's right, I'll send a new version this weekend.
> > 
> > > > +				return;
> > > 
> > > I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> > > from nmsk, so the change can be something like, then patch 1 isn't
> > > necessary.
> >  
> > It works just fine except that it's O(N^2), where O(N) is easily
> > achievable. Again, it's not about performance, it's about coding
> > habits.
> 
> Both for_each_cpu_and() and for_each_cpu_and_from() are O(N), aren't
> they? Given both two are based on find_next_and_bit().

for_each_cpu_and() is the same Schlemiel the Painter's code, as the
plain for() was.
 
> for_each_cpu_and() is simpler and more readable, and more
> importantly, we can save one single-user public helper.
> 
> >  
> > > 		for_each_cpu_and(sibl, siblmsk, nmsk) {
> > > 			cpumask_clear_cpu(sibl, nmsk);
> > >   			cpumask_set_cpu(sibl, irqmsk);
> > > 			if (--cpus_per_grp == 0)
> > > 				return;
> > > 		}
> 
> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more
  2023-12-13  1:06   ` Ming Lei
@ 2023-12-25 18:03     ` Yury Norov
  0 siblings, 0 replies; 21+ messages in thread
From: Yury Norov @ 2023-12-25 18:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Andrew Morton, Thomas Gleixner, linux-kernel, Andy Shevchenko,
	Rasmus Villemoes

On Wed, Dec 13, 2023 at 09:06:12AM +0800, Ming Lei wrote:
> On Mon, Dec 11, 2023 at 08:21:07PM -0800, Yury Norov wrote:
> > The outer and inner loops of grp_spread_init_one() do the same thing -
> > move a bit from nmsk to irqmsk.
> > 
> > The inner loop iterates the sibling group, which includes the CPU picked
> > by outer loop. And it means that we can drop the part that moves the bit
> > in the outer loop.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  lib/group_cpus.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index 664a56171a1b..7aa7a6289355 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -18,14 +18,8 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >  	int cpu, sibl;
> >  
> >  	for_each_cpu(cpu, nmsk) {
> > -		__cpumask_clear_cpu(cpu, nmsk);
> > -		__cpumask_set_cpu(cpu, irqmsk);
> > -		if (cpus_per_grp-- == 0)
> > -			return;
> > -
> > -		/* If the cpu has siblings, use them first */
> >  		siblmsk = topology_sibling_cpumask(cpu);
> > -		sibl = cpu + 1;
> > +		sibl = cpu;
> >  
> >  		for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> >  			__cpumask_clear_cpu(sibl, nmsk);
> 
> Correctness of the above change requires that 'cpu' has to be included
> into topology_sibling_cpumask(cpu), however, not sure it is always true,
> see the following comment in Documentation/arch/x86/topology.rst
> 
> `
>   - topology_sibling_cpumask():
> 
>     The cpumask contains all online threads in the core to which a thread
>     belongs.
> `

It's kind of nontrivial to spread IRQs on offline CPUs, but
technically the above seems correct. I'll drop the patch then.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-12-25 18:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-12  4:21 [PATCH v3 0/7] lib/group_cpus: rework grp_spread_init_one() and make it O(1) Yury Norov
2023-12-12  4:21 ` [PATCH v3 1/7] cpumask: introduce for_each_cpu_and_from() Yury Norov
2023-12-12  4:21 ` [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one() Yury Norov
2023-12-12  9:46   ` Ming Lei
2023-12-12 17:04     ` Yury Norov
2023-12-13  0:06       ` Ming Lei
2023-12-25 17:38         ` Yury Norov
2023-12-12  4:21 ` [PATCH v3 3/7] lib/group_cpus: relax atomicity requirement " Yury Norov
2023-12-12  9:50   ` Ming Lei
2023-12-12 16:52     ` Yury Norov
2023-12-13  0:14       ` Ming Lei
2023-12-13 17:03         ` Yury Norov
2023-12-14  0:43           ` Ming Lei
2023-12-12  4:21 ` [PATCH v3 4/7] lib/group_cpus: optimize outer loop " Yury Norov
2023-12-12  4:21 ` [PATCH v3 5/7] lib/cgroup_cpus: don't zero cpumasks in group_cpus_evenly() on allocation Yury Norov
2023-12-13  0:56   ` Ming Lei
2023-12-12  4:21 ` [PATCH v3 6/7] lib/group_cpus: drop unneeded cpumask_empty() call in __group_cpus_evenly() Yury Norov
2023-12-13  0:59   ` Ming Lei
2023-12-12  4:21 ` [PATCH v3 7/7] lib/group_cpus: simplify grp_spread_init_one() for more Yury Norov
2023-12-13  1:06   ` Ming Lei
2023-12-25 18:03     ` Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).