linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] sched, net: NUMA-aware CPU spreading interface
@ 2022-08-25 18:12 Valentin Schneider
  2022-08-25 18:12 ` [PATCH v3 1/9] cpumask: Make cpumask_full() check for nr_cpu_ids bits Valentin Schneider
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Valentin Schneider @ 2022-08-25 18:12 UTC (permalink / raw)
  To: netdev, linux-rdma, linux-kernel
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Yury Norov, Andy Shevchenko,
	Rasmus Villemoes, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Mel Gorman, Greg Kroah-Hartman,
	Heiko Carstens, Tony Luck, Jonathan Cameron, Gal Pressman,
	Tariq Toukan, Jesse Brandeburg

Hi folks,

Tariq pointed out in [1] that drivers allocating IRQ vectors would benefit
from having smarter NUMA-awareness (cpumask_local_spread() doesn't quite cut
it).

The proposed interface involved an array of CPUs and a temporary cpumask, and
being my difficult self what I'm proposing here is an interface that doesn't
require any temporary storage other than some stack variables (at the cost of
one wild macro).

Patch 9/9 is just there to showcase how the thing would be used. If this doesn't
get hated on, I'll let Tariq pick this up and push it with his networking driver
changes (with actual changelogs).

[1]: https://lore.kernel.org/all/20220728191203.4055-1-tariqt@nvidia.com/

A note on treewide use of for_each_cpu_andnot()
===============================================

I've used the below coccinelle script to find places that could be patched (I
couldn't figure out the valid syntax to patch from coccinelle itself):

,-----
@tmpandnot@
expression tmpmask;
iterator for_each_cpu;
position p;
statement S;
@@
cpumask_andnot(tmpmask, ...);

...

(
for_each_cpu@p(..., tmpmask, ...)
	S
|
for_each_cpu@p(..., tmpmask, ...)
{
	...
}
)

@script:python depends on tmpandnot@
p << tmpandnot.p;
@@
coccilib.report.print_report(p[0], "andnot loop here")
'-----

Which yields (against c40e8341e3b3):

.//arch/powerpc/kernel/smp.c:1587:1-13: andnot loop here
.//arch/powerpc/kernel/smp.c:1530:1-13: andnot loop here
.//arch/powerpc/kernel/smp.c:1440:1-13: andnot loop here
.//arch/powerpc/platforms/powernv/subcore.c:306:2-14: andnot loop here
.//arch/x86/kernel/apic/x2apic_cluster.c:62:1-13: andnot loop here
.//drivers/acpi/acpi_pad.c:110:1-13: andnot loop here
.//drivers/cpufreq/armada-8k-cpufreq.c:148:1-13: andnot loop here
.//drivers/cpufreq/powernv-cpufreq.c:931:1-13: andnot loop here
.//drivers/net/ethernet/sfc/efx_channels.c:73:1-13: andnot loop here
.//drivers/net/ethernet/sfc/siena/efx_channels.c:73:1-13: andnot loop here
.//kernel/sched/core.c:345:1-13: andnot loop here
.//kernel/sched/core.c:366:1-13: andnot loop here
.//net/core/dev.c:3058:1-13: andnot loop here

A lot of those are actually of the shape

  for_each_cpu(cpu, mask) {
      ...
      cpumask_andnot(mask, ...);
  }

I think *some* of the powerpc ones would be a match for for_each_cpu_andnot(),
but I decided to just stick to the one obvious one in __sched_core_flip().
  
Revisions
=========

v2 -> v3
++++++++

o Added for_each_cpu_and() and for_each_cpu_andnot() tests (Yury)
o New patches to fix issues raised by running the above

o New patch to use for_each_cpu_andnot() in sched/core.c (Yury)

v1 -> v2
++++++++

o Split _find_next_bit() @invert into @invert1 and @invert2 (Yury)
o Rebase onto v6.0-rc1

Cheers,
Valentin

Valentin Schneider (9):
  cpumask: Make cpumask_full() check for nr_cpu_ids bits
  lib/test_cpumask: Make test_cpumask_last check for nr_cpu_ids bits
  bitops: Introduce find_next_andnot_bit()
  cpumask: Introduce for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  sched/core: Merge cpumask_andnot()+for_each_cpu() into
    for_each_cpu_andnot()
  sched/topology: Introduce sched_numa_hop_mask()
  sched/topology: Introduce for_each_numa_hop_cpu()
  SHOWCASE: net/mlx5e: Leverage for_each_numa_hop_cpu()

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 ++++-
 include/linux/cpumask.h                      | 41 ++++++++++++++++-
 include/linux/find.h                         | 44 ++++++++++++++++---
 include/linux/topology.h                     | 46 ++++++++++++++++++++
 kernel/sched/core.c                          |  5 +--
 kernel/sched/topology.c                      | 28 ++++++++++++
 lib/find_bit.c                               | 23 +++++-----
 lib/test_cpumask.c                           | 23 +++++++++-
 8 files changed, 196 insertions(+), 26 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-09-05 18:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 18:12 [PATCH v3 0/9] sched, net: NUMA-aware CPU spreading interface Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 1/9] cpumask: Make cpumask_full() check for nr_cpu_ids bits Valentin Schneider
2022-08-25 20:49   ` Yury Norov
2022-08-28  8:35     ` Sander Vanheule
2022-08-28 16:38       ` Yury Norov
2022-08-25 18:12 ` [PATCH v3 2/9] lib/test_cpumask: Make test_cpumask_last " Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 3/9] bitops: Introduce find_next_andnot_bit() Valentin Schneider
2022-08-25 21:05   ` Yury Norov
2022-08-25 23:17     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 4/9] cpumask: Introduce for_each_cpu_andnot() Valentin Schneider
2022-08-25 21:14   ` Yury Norov
2022-09-05 16:44     ` Valentin Schneider
2022-09-05 18:33       ` Yury Norov
2022-08-25 18:12 ` [PATCH v3 5/9] lib/test_cpumask: Add for_each_cpu_and(not) tests Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 6/9] sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() Valentin Schneider
2022-08-25 21:16   ` Yury Norov
2022-08-25 23:20     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 7/9] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
2022-08-26  8:14   ` Yicong Yang
2022-09-05 16:51     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 8/9] sched/topology: Introduce for_each_numa_hop_cpu() Valentin Schneider
2022-09-05  9:46   ` Tariq Toukan
2022-09-05 16:44     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 9/9] SHOWCASE: net/mlx5e: Leverage for_each_numa_hop_cpu() Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).