cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] bitmap: optimize API usage
@ 2024-05-13 22:01 Yury Norov
  2024-05-13 22:01 ` [PATCH 1/6] smp: optimize smp_call_function_many_cond() Yury Norov
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

In a few places bitmap API is called with such a combination of
parameters that makes the call unneeded, or there's a trivial cheaper
alternative.

For example, cpumask_copy(dst, src) where dst == src is simply a no-op.
This series addresses such cases spotted on x86_64 with LTP.

All the patches are independent and may be applied separately in
corresponding subsystems. Or I can take them in bitmap branch, if it's
more convenient.

Yury Norov (6):
  smp: optimize smp_call_function_many_cond()
  sched/topology: optimize topology_span_sane()
  driver core: cpu: optimize print_cpus_isolated()
  genirq: optimze irq_do_set_affinity()
  cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
  tick/common: optimize cpumask_equal() usage

 drivers/base/cpu.c        |  6 ++++--
 kernel/cgroup/cpuset.c    |  3 +++
 kernel/irq/manage.c       |  3 ++-
 kernel/sched/topology.c   |  2 +-
 kernel/smp.c              |  5 ++++-
 kernel/time/tick-common.c | 15 +++++++++++----
 6 files changed, 25 insertions(+), 9 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] smp: optimize smp_call_function_many_cond()
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-13 22:01 ` [PATCH 2/6] sched/topology: optimize topology_span_sane() Yury Norov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

The function may be called with mask == cpu_online_mask, and in this
case we can use a cheaper cpumask_cooy() instead of cpumask_and().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/smp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f085ebcdf9e7..6f41214a1b54 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -798,7 +798,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 
 	if (run_remote) {
 		cfd = this_cpu_ptr(&cfd_data);
-		cpumask_and(cfd->cpumask, mask, cpu_online_mask);
+		if (mask == cpu_online_mask)
+			cpumask_copy(cfd->cpumask, mask);
+		else
+			cpumask_and(cfd->cpumask, mask, cpu_online_mask);
 		__cpumask_clear_cpu(this_cpu, cfd->cpumask);
 
 		cpumask_clear(cfd->cpumask_ipi);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] sched/topology: optimize topology_span_sane()
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
  2024-05-13 22:01 ` [PATCH 1/6] smp: optimize smp_call_function_many_cond() Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-14 20:53   ` Christophe JAILLET
  2024-05-13 22:01 ` [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated() Yury Norov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
even though cpu != i. In such case, cpumask_equal() would always return
true, and we can proceed to the next CPU immediately.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 99ea5986038c..eb9eb17b0efa 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2360,7 +2360,7 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
 	 * breaks the linking done for an earlier span.
 	 */
 	for_each_cpu(i, cpu_map) {
-		if (i == cpu)
+		if (i == cpu || tl->mask(cpu) == tl->mask(i))
 			continue;
 		/*
 		 * We should 'and' all those masks with 'cpu_map' to exactly
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated()
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
  2024-05-13 22:01 ` [PATCH 1/6] smp: optimize smp_call_function_many_cond() Yury Norov
  2024-05-13 22:01 ` [PATCH 2/6] sched/topology: optimize topology_span_sane() Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-14 21:02   ` Thomas Gleixner
  2024-05-13 22:01 ` [PATCH 4/6] genirq: optimize irq_do_set_affinity() Yury Norov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

The function may be called with housekeeping_cpumask == cpu_possible_mask,
and in such case the 'isolated' cpumask would be just empty.

We can call cpumask_clear() in that case, and save CPU cycles.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/base/cpu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 56fba44ba391..1322957286dd 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -282,8 +282,10 @@ static ssize_t print_cpus_isolated(struct device *dev,
 	if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
 		return -ENOMEM;
 
-	cpumask_andnot(isolated, cpu_possible_mask,
-		       housekeeping_cpumask(HK_TYPE_DOMAIN));
+	if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN))
+		cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
+	else
+		cpumask_clear(isolated);
 	len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));
 
 	free_cpumask_var(isolated);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] genirq: optimize irq_do_set_affinity()
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
                   ` (2 preceding siblings ...)
  2024-05-13 22:01 ` [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated() Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-14 12:51   ` Jinjie Ruan
  2024-05-13 22:01 ` [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects() Yury Norov
  2024-05-13 22:01 ` [PATCH 6/6] tick/common: optimize cpumask_equal() usage Yury Norov
  5 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

If mask == desc->irq_common_data.affinity, copying one to another is
useless, and we can just skip it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/irq/manage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index bf9ae8a8686f..ad9ed9fdf919 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -285,7 +285,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
 	case IRQ_SET_MASK_OK_DONE:
-		cpumask_copy(desc->irq_common_data.affinity, mask);
+		if (desc->irq_common_data.affinity != mask)
+			cpumask_copy(desc->irq_common_data.affinity, mask);
 		fallthrough;
 	case IRQ_SET_MASK_OK_NOCOPY:
 		irq_validate_effective_affinity(data);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
                   ` (3 preceding siblings ...)
  2024-05-13 22:01 ` [PATCH 4/6] genirq: optimize irq_do_set_affinity() Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-14 16:47   ` Waiman Long
  2024-05-13 22:01 ` [PATCH 6/6] tick/common: optimize cpumask_equal() usage Yury Norov
  5 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

If the function is called with tsk1 == tsk2, we know for sure that their
mems_allowed nodes do intersect, and so we can return immediately instead
of checking the nodes content.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4237c8748715..47ed206d4890 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
 int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
 				   const struct task_struct *tsk2)
 {
+	if (tsk1 == tsk2)
+		return 1;
+
 	return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] tick/common: optimize cpumask_equal() usage
  2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
                   ` (4 preceding siblings ...)
  2024-05-13 22:01 ` [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects() Yury Norov
@ 2024-05-13 22:01 ` Yury Norov
  2024-05-14  8:31   ` Thomas Gleixner
  5 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-13 22:01 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Yury Norov,
	Zefan Li, cgroups

Some functions in the file call cpumask_equal() with src1p == src2p.
We can obviously just skip comparison entirely in this case.

This patch fixes cpumask_equal invocations when boot-test or LTP detect
such condition.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/time/tick-common.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..b31fef292833 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
 	 * When the device is not per cpu, pin the interrupt to the
 	 * current cpu:
 	 */
-	if (!cpumask_equal(newdev->cpumask, cpumask))
+	if (newdev->cpumask != cpumask &&
+			!cpumask_equal(newdev->cpumask, cpumask))
 		irq_set_affinity(newdev->irq, cpumask);
 
 	/*
@@ -288,14 +289,19 @@ static bool tick_check_percpu(struct clock_event_device *curdev,
 {
 	if (!cpumask_test_cpu(cpu, newdev->cpumask))
 		return false;
-	if (cpumask_equal(newdev->cpumask, cpumask_of(cpu)))
+	if (newdev->cpumask == cpumask_of(cpu) ||
+			cpumask_equal(newdev->cpumask, cpumask_of(cpu)))
 		return true;
 	/* Check if irq affinity can be set */
 	if (newdev->irq >= 0 && !irq_can_set_affinity(newdev->irq))
 		return false;
 	/* Prefer an existing cpu local device */
-	if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
+	if (!curdev)
+		return true;
+	if (curdev->cpumask == cpumask_of(cpu) ||
+			cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
 		return false;
+
 	return true;
 }
 
@@ -316,7 +322,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
 	 */
 	return !curdev ||
 		newdev->rating > curdev->rating ||
-	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
+		!(curdev->cpumask == newdev->cpumask &&
+			cpumask_equal(curdev->cpumask, newdev->cpumask));
 }
 
 /*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage
  2024-05-13 22:01 ` [PATCH 6/6] tick/common: optimize cpumask_equal() usage Yury Norov
@ 2024-05-14  8:31   ` Thomas Gleixner
  2024-05-14  8:42     ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2024-05-14  8:31 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Valentin Schneider,
	Vincent Guittot, Waiman Long, Yury Norov, Zefan Li, cgroups

On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> Some functions in the file call cpumask_equal() with src1p == src2p.
> We can obviously just skip comparison entirely in this case.
>
> This patch fixes cpumask_equal invocations when boot-test or LTP detect
> such condition.

Please write your changelogs in imperative mood.

git grep 'This patch' Documentation/process/

Also please see Documentation/process/maintainer-tip.rst

> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  kernel/time/tick-common.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..b31fef292833 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
>  	 * When the device is not per cpu, pin the interrupt to the
>  	 * current cpu:
>  	 */
> -	if (!cpumask_equal(newdev->cpumask, cpumask))
> +	if (newdev->cpumask != cpumask &&
> +			!cpumask_equal(newdev->cpumask, cpumask))
>  		irq_set_affinity(newdev->irq, cpumask);

I'm not seeing the benefit. This is slow path setup code so the extra
comparison does not really buy anything aside of malformatted line
breaks.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage
  2024-05-14  8:31   ` Thomas Gleixner
@ 2024-05-14  8:42     ` Thomas Gleixner
  2024-05-14 16:47       ` Yury Norov
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2024-05-14  8:42 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Valentin Schneider,
	Vincent Guittot, Waiman Long, Yury Norov, Zefan Li, cgroups

On Tue, May 14 2024 at 10:31, Thomas Gleixner wrote:
> On Mon, May 13 2024 at 15:01, Yury Norov wrote:
>> Some functions in the file call cpumask_equal() with src1p == src2p.
>> We can obviously just skip comparison entirely in this case.
>>
>> This patch fixes cpumask_equal invocations when boot-test or LTP detect
>> such condition.
>
> Please write your changelogs in imperative mood.
>
> git grep 'This patch' Documentation/process/
>
> Also please see Documentation/process/maintainer-tip.rst
>
>> Signed-off-by: Yury Norov <yury.norov@gmail.com>
>> ---
>>  kernel/time/tick-common.c | 15 +++++++++++----
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
>> index d88b13076b79..b31fef292833 100644
>> --- a/kernel/time/tick-common.c
>> +++ b/kernel/time/tick-common.c
>> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
>>  	 * When the device is not per cpu, pin the interrupt to the
>>  	 * current cpu:
>>  	 */
>> -	if (!cpumask_equal(newdev->cpumask, cpumask))
>> +	if (newdev->cpumask != cpumask &&
>> +			!cpumask_equal(newdev->cpumask, cpumask))
>>  		irq_set_affinity(newdev->irq, cpumask);
>
> I'm not seeing the benefit. This is slow path setup code so the extra
> comparison does not really buy anything aside of malformatted line
> breaks.

Instead of sprinkling these conditional all over the place, can't you
just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
bitmap_equal()?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] genirq: optimize irq_do_set_affinity()
  2024-05-13 22:01 ` [PATCH 4/6] genirq: optimize irq_do_set_affinity() Yury Norov
@ 2024-05-14 12:51   ` Jinjie Ruan
  2024-05-14 16:16     ` Yury Norov
  0 siblings, 1 reply; 18+ messages in thread
From: Jinjie Ruan @ 2024-05-14 12:51 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Zefan Li,
	cgroups



On 2024/5/14 6:01, Yury Norov wrote:
> If mask == desc->irq_common_data.affinity, copying one to another is
> useless, and we can just skip it.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  kernel/irq/manage.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index bf9ae8a8686f..ad9ed9fdf919 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -285,7 +285,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  	switch (ret) {
>  	case IRQ_SET_MASK_OK:
>  	case IRQ_SET_MASK_OK_DONE:
> -		cpumask_copy(desc->irq_common_data.affinity, mask);
> +		if (desc->irq_common_data.affinity != mask)
> +			cpumask_copy(desc->irq_common_data.affinity, mask);

It seems that mask is a pointer, shouldn't use "cpumask_equal"?

>  		fallthrough;
>  	case IRQ_SET_MASK_OK_NOCOPY:
>  		irq_validate_effective_affinity(data);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] genirq: optimize irq_do_set_affinity()
  2024-05-14 12:51   ` Jinjie Ruan
@ 2024-05-14 16:16     ` Yury Norov
  0 siblings, 0 replies; 18+ messages in thread
From: Yury Norov @ 2024-05-14 16:16 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Waiman Long, Zefan Li,
	cgroups

On Tue, May 14, 2024 at 5:51 AM Jinjie Ruan <ruanjinjie@huawei.com> wrote:
>
>
>
> On 2024/5/14 6:01, Yury Norov wrote:
> > If mask == desc->irq_common_data.affinity, copying one to another is
> > useless, and we can just skip it.
> >
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  kernel/irq/manage.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index bf9ae8a8686f..ad9ed9fdf919 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -285,7 +285,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >       switch (ret) {
> >       case IRQ_SET_MASK_OK:
> >       case IRQ_SET_MASK_OK_DONE:
> > -             cpumask_copy(desc->irq_common_data.affinity, mask);
> > +             if (desc->irq_common_data.affinity != mask)
> > +                     cpumask_copy(desc->irq_common_data.affinity, mask);
>
> It seems that mask is a pointer, shouldn't use "cpumask_equal"?

cpumask_equal() is O(N), just as cpumask_copy(), so we'll have no
benefit if the masks are equal, and will double slow it if they aren't
in the worst case.

On the other hand, pointers comparison is O(1), a very quick tasks,
even more the pointers are already in registers.

> >               fallthrough;
> >       case IRQ_SET_MASK_OK_NOCOPY:
> >               irq_validate_effective_affinity(data);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage
  2024-05-14  8:42     ` Thomas Gleixner
@ 2024-05-14 16:47       ` Yury Norov
  2024-05-14 20:47         ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Yury Norov @ 2024-05-14 16:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Valentin Schneider,
	Vincent Guittot, Waiman Long, Zefan Li, cgroups

On Tue, May 14, 2024 at 1:42 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, May 14 2024 at 10:31, Thomas Gleixner wrote:
> > On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> >> Some functions in the file call cpumask_equal() with src1p == src2p.
> >> We can obviously just skip comparison entirely in this case.
> >>
> >> This patch fixes cpumask_equal invocations when boot-test or LTP detect
> >> such condition.
> >
> > Please write your changelogs in imperative mood.
> >
> > git grep 'This patch' Documentation/process/
> >
> > Also please see Documentation/process/maintainer-tip.rst
> >
> >> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> >> ---
> >>  kernel/time/tick-common.c | 15 +++++++++++----
> >>  1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> >> index d88b13076b79..b31fef292833 100644
> >> --- a/kernel/time/tick-common.c
> >> +++ b/kernel/time/tick-common.c
> >> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
> >>       * When the device is not per cpu, pin the interrupt to the
> >>       * current cpu:
> >>       */
> >> -    if (!cpumask_equal(newdev->cpumask, cpumask))
> >> +    if (newdev->cpumask != cpumask &&
> >> +                    !cpumask_equal(newdev->cpumask, cpumask))
> >>              irq_set_affinity(newdev->irq, cpumask);
> >
> > I'm not seeing the benefit. This is slow path setup code so the extra
> > comparison does not really buy anything aside of malformatted line
> > breaks.
>
> Instead of sprinkling these conditional all over the place, can't you
> just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
> bitmap_equal()?

I proposed this a while (few years) ago, and it has been rejected. On
bitmaps level we decided not to do that for the reasons memcpy() and
memcmp() doesn't, and on cpumasks and nodemasks level it hasn't
been discussed at all.

Now that most of bitmap ops have inline and outline implementation,
we technically can move this checks in outline code, as inline bitmap
ops are very lightweight already.

So I see the following options:
 - Implement these sanity checks in outline bitmap API (lib/bitmap.c);
 - Implement them on cpumask and nodemask level; or
 - add a new family of helpers that do this check, like
  bitmap_copy_if_needed() (better name appreciated).

The argument against #1 and #2 these days was that memcpy() and
similarly bitmap_copy() with dst == src may be a sign of error, and
we don't want to add a code that optimizes for it.

Now, I ran the kernel through the LTP test and in practice all the
cases that I spot look pretty normal. So I can continue sprinkling
the checks once a few years, or do something like described above.

Can you / everyone please share your opinion?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
  2024-05-13 22:01 ` [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects() Yury Norov
@ 2024-05-14 16:47   ` Waiman Long
  2024-05-14 16:50     ` Tejun Heo
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2024-05-14 16:47 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Zefan Li, cgroups


On 5/13/24 18:01, Yury Norov wrote:
> If the function is called with tsk1 == tsk2, we know for sure that their
> mems_allowed nodes do intersect, and so we can return immediately instead
> of checking the nodes content.
>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>   kernel/cgroup/cpuset.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4237c8748715..47ed206d4890 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
>   int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
>   				   const struct task_struct *tsk2)
>   {
> +	if (tsk1 == tsk2)
> +		return 1;
> +
>   	return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
>   }
>   
Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
  2024-05-14 16:47   ` Waiman Long
@ 2024-05-14 16:50     ` Tejun Heo
  2024-05-14 16:55       ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-05-14 16:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Zefan Li, cgroups

On Tue, May 14, 2024 at 12:47:59PM -0400, Waiman Long wrote:
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > index 4237c8748715..47ed206d4890 100644
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
> >   int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
> >   				   const struct task_struct *tsk2)
> >   {
> > +	if (tsk1 == tsk2)
> > +		return 1;
> > +
> >   	return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
> >   }
> Reviewed-by: Waiman Long <longman@redhat.com>

I'm not sure this is worth the added code.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
  2024-05-14 16:50     ` Tejun Heo
@ 2024-05-14 16:55       ` Waiman Long
  0 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2024-05-14 16:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Zefan Li, cgroups


On 5/14/24 12:50, Tejun Heo wrote:
> On Tue, May 14, 2024 at 12:47:59PM -0400, Waiman Long wrote:
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 4237c8748715..47ed206d4890 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
>>>    int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
>>>    				   const struct task_struct *tsk2)
>>>    {
>>> +	if (tsk1 == tsk2)
>>> +		return 1;
>>> +
>>>    	return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
>>>    }
>> Reviewed-by: Waiman Long <longman@redhat.com>
> I'm not sure this is worth the added code.

Yes, I do have a second thought afterward. The only caller of 
cpuset_mems_allowed_intersects() is oom_cpuset_eligible() in 
mm/oom_kill.c. So I believe a better fix is to avoid calling 
cpuset_mems_allowed_intersects() there if current == tsk.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage
  2024-05-14 16:47       ` Yury Norov
@ 2024-05-14 20:47         ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2024-05-14 20:47 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Valentin Schneider,
	Vincent Guittot, Waiman Long, Zefan Li, cgroups

On Tue, May 14 2024 at 09:47, Yury Norov wrote:
>> Instead of sprinkling these conditional all over the place, can't you
>> just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
>> bitmap_equal()?
>
> I proposed this a while (few years) ago, and it has been rejected. On
> bitmaps level we decided not to do that for the reasons memcpy() and
> memcmp() doesn't, and on cpumasks and nodemasks level it hasn't
> been discussed at all.
>
> Now that most of bitmap ops have inline and outline implementation,
> we technically can move this checks in outline code, as inline bitmap
> ops are very lightweight already.
>
> So I see the following options:
>  - Implement these sanity checks in outline bitmap API (lib/bitmap.c);
>  - Implement them on cpumask and nodemask level; or
>  - add a new family of helpers that do this check, like
>   bitmap_copy_if_needed() (better name appreciated).
>
> The argument against #1 and #2 these days was that memcpy() and
> similarly bitmap_copy() with dst == src may be a sign of error, and
> we don't want to add a code that optimizes for it.

That's a fair argument.

> Now, I ran the kernel through the LTP test and in practice all the
> cases that I spot look pretty normal. So I can continue sprinkling
> the checks once a few years, or do something like described above.

I don't see these checks as valuable in most cases and I detest them as
they make the code harder to read.

Except for smp_call_function_many_cond() and to a lesser extent
irq_do_set_affinity() none of them you added really matters.

Though it might be worth to have helper functions which make it obvious
that the src == dst case is intentional.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] sched/topology: optimize topology_span_sane()
  2024-05-13 22:01 ` [PATCH 2/6] sched/topology: optimize topology_span_sane() Yury Norov
@ 2024-05-14 20:53   ` Christophe JAILLET
  0 siblings, 0 replies; 18+ messages in thread
From: Christophe JAILLET @ 2024-05-14 20:53 UTC (permalink / raw)
  To: Yury Norov
  Cc: anna-maria, bristot, bsegall, cgroups, dietmar.eggemann,
	frederic, gregkh, hannes, imran.f.khan, juri.lelli, leobras,
	linux-kernel, lizefan.x, longman, mgorman, mingo, paulmck,
	peterz, rafael, riel, rostedt, tglx, tj, vincent.guittot,
	vschneid

Le 14/05/2024 à 00:01, Yury Norov a écrit :
> The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
> even though cpu != i. In such case, cpumask_equal() would always return
> true, and we can proceed to the next CPU immediately.
> 
> Signed-off-by: Yury Norov <yury.norov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>   kernel/sched/topology.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 99ea5986038c..eb9eb17b0efa 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2360,7 +2360,7 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
>   	 * breaks the linking done for an earlier span.
>   	 */
>   	for_each_cpu(i, cpu_map) {
> -		if (i == cpu)
> +		if (i == cpu || tl->mask(cpu) == tl->mask(i))
>   			continue;
>   		/*
>   		 * We should 'and' all those masks with 'cpu_map' to exactly

Hi,

does it make sense to pre-compute tl->mask(cpu) outside the for_each_cpu()?

CJ

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated()
  2024-05-13 22:01 ` [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated() Yury Norov
@ 2024-05-14 21:02   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2024-05-14 21:02 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Greg Kroah-Hartman, Paul E. McKenney,
	Rafael J. Wysocki, Anna-Maria Behnsen, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann,
	Frederic Weisbecker, Imran Khan, Ingo Molnar, Johannes Weiner,
	Juri Lelli, Leonardo Bras, Mel Gorman, Peter Zijlstra,
	Rik van Riel, Steven Rostedt, Tejun Heo, Valentin Schneider,
	Vincent Guittot, Waiman Long, Yury Norov, Zefan Li, cgroups

On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> The function may be called with housekeeping_cpumask == cpu_possible_mask,

How so? There is no cpumask argument in the function signature. Can you
please be precise?

> and in such case the 'isolated' cpumask would be just empty.
>
> We can call cpumask_clear() in that case, and save CPU cycles.
>
> @@ -282,8 +282,10 @@ static ssize_t print_cpus_isolated(struct device *dev,
>  	if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
>  		return -ENOMEM;
>  
> -	cpumask_andnot(isolated, cpu_possible_mask,
> -		       housekeeping_cpumask(HK_TYPE_DOMAIN));
> +	if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN))
> +		cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
> +	else
> +		cpumask_clear(isolated);
>  	len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));
>  
>  	free_cpumask_var(isolated);

Seriously? You need clear() to emit an empty string via %*pbl?

	if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN)) {
        	if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
                	return -ENOMEM;
                cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
                len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));
	  	free_cpumask_var(isolated);
	} else {
        	len = sysfs_emit(buf, "\n");
        }

That actually would make sense and spare way more CPU cycles, no?

Is it actually worth the larger text size? Not really convinced about that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-14 21:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-13 22:01 [PATCH 0/6] bitmap: optimize API usage Yury Norov
2024-05-13 22:01 ` [PATCH 1/6] smp: optimize smp_call_function_many_cond() Yury Norov
2024-05-13 22:01 ` [PATCH 2/6] sched/topology: optimize topology_span_sane() Yury Norov
2024-05-14 20:53   ` Christophe JAILLET
2024-05-13 22:01 ` [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated() Yury Norov
2024-05-14 21:02   ` Thomas Gleixner
2024-05-13 22:01 ` [PATCH 4/6] genirq: optimize irq_do_set_affinity() Yury Norov
2024-05-14 12:51   ` Jinjie Ruan
2024-05-14 16:16     ` Yury Norov
2024-05-13 22:01 ` [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects() Yury Norov
2024-05-14 16:47   ` Waiman Long
2024-05-14 16:50     ` Tejun Heo
2024-05-14 16:55       ` Waiman Long
2024-05-13 22:01 ` [PATCH 6/6] tick/common: optimize cpumask_equal() usage Yury Norov
2024-05-14  8:31   ` Thomas Gleixner
2024-05-14  8:42     ` Thomas Gleixner
2024-05-14 16:47       ` Yury Norov
2024-05-14 20:47         ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).