linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V2 0/2] mlx5: Use NUMA distance metrics
@ 2022-07-18 12:43 Tariq Toukan
  2022-07-18 12:43 ` [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest Tariq Toukan
  2022-07-18 12:43 ` [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Tariq Toukan
  0 siblings, 2 replies; 8+ messages in thread
From: Tariq Toukan @ 2022-07-18 12:43 UTC (permalink / raw)
  To: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Eric Dumazet, Paolo Abeni, netdev, Gal Pressman, Vincent Guittot,
	linux-kernel, Tariq Toukan

Hi,

Expose the scheduler's sched_numa_find_closest() function, and use it in
mlx5 device driver.  This replaces the binary NUMA preference (local /
remote) with an improved one that minds the actual distances, so that
remote NUMAs with short distance are preferred over farther ones.

This has significant performance implications when using NUMA-aware
memory allocations, improving the throughput and CPU utilization.

Regards,
Tariq

Tariq Toukan (2):
  sched/topology: Expose sched_numa_find_closest
  net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity
    hints

 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 62 +++++++++++++++++++-
 include/linux/sched/topology.h               |  2 +
 kernel/sched/topology.c                      |  1 +
 3 files changed, 62 insertions(+), 3 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest
  2022-07-18 12:43 [PATCH net-next V2 0/2] mlx5: Use NUMA distance metrics Tariq Toukan
@ 2022-07-18 12:43 ` Tariq Toukan
  2022-07-18 13:47   ` Peter Zijlstra
  2022-07-18 12:43 ` [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Tariq Toukan
  1 sibling, 1 reply; 8+ messages in thread
From: Tariq Toukan @ 2022-07-18 12:43 UTC (permalink / raw)
  To: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Eric Dumazet, Paolo Abeni, netdev, Gal Pressman, Vincent Guittot,
	linux-kernel, Tariq Toukan

This logic can help device drivers prefer some remote cpus
over others, according to the NUMA distance metrics.

Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/linux/sched/topology.h | 2 ++
 kernel/sched/topology.c        | 1 +
 2 files changed, 3 insertions(+)

v2:
Replaced EXPORT_SYMBOL with EXPORT_SYMBOL_GPL, per Peter's comment.

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 56cffe42abbc..d467c30bdbb9 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -61,6 +61,8 @@ static inline int cpu_numa_flags(void)
 {
 	return SD_NUMA;
 }
+
+int sched_numa_find_closest(const struct cpumask *cpus, int cpu);
 #endif
 
 extern int arch_asym_cpu_priority(int cpu);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05b6c2ad90b9..274fb2bd3849 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2066,6 +2066,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
 
 	return found;
 }
+EXPORT_SYMBOL_GPL(sched_numa_find_closest);
 
 #endif /* CONFIG_NUMA */
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2022-07-18 12:43 [PATCH net-next V2 0/2] mlx5: Use NUMA distance metrics Tariq Toukan
  2022-07-18 12:43 ` [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest Tariq Toukan
@ 2022-07-18 12:43 ` Tariq Toukan
  2022-07-18 13:50   ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Tariq Toukan @ 2022-07-18 12:43 UTC (permalink / raw)
  To: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Peter Zijlstra, Juri Lelli
  Cc: Eric Dumazet, Paolo Abeni, netdev, Gal Pressman, Vincent Guittot,
	linux-kernel, Tariq Toukan

In the IRQ affinity hints, replace the binary NUMA preference (local /
remote) with an improved one that minds the actual distances, so that
remote NUMAs with short distance are preferred over farther ones.

This has significant performance implications when using NUMA-aware
allocated memory (follow [1] and derivatives for example).

[1]
drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
   int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));

Performance tests:

TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121

+-------------------------+-----------+------------------+------------------+
|                         | BW (Gbps) | TX side CPU util | RX side CPU util |
+-------------------------+-----------+------------------+------------------+
| Baseline                | 52.3      | 6.4 %            | 17.9 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
+-------------------------+-----------+------------------+------------------+
| Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
+-------------------------+-----------+------------------+------------------+

Bottleneck in RX side is released, reached linerate (~1.8x speedup).
~30% less cpu util on TX.

* CPU util on active cores only.

Setups details (similar for both sides):

NIC: ConnectX6-DX dual port, 100 Gbps each.
Single port used in the tests.

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        16
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               1
Model name:          AMD EPYC 7763 64-Core Processor
Stepping:            1
CPU MHz:             2594.804
BogoMIPS:            4890.73
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            32768K
NUMA node0 CPU(s):   0-7,128-135
NUMA node1 CPU(s):   8-15,136-143
NUMA node2 CPU(s):   16-23,144-151
NUMA node3 CPU(s):   24-31,152-159
NUMA node4 CPU(s):   32-39,160-167
NUMA node5 CPU(s):   40-47,168-175
NUMA node6 CPU(s):   48-55,176-183
NUMA node7 CPU(s):   56-63,184-191
NUMA node8 CPU(s):   64-71,192-199
NUMA node9 CPU(s):   72-79,200-207
NUMA node10 CPU(s):  80-87,208-215
NUMA node11 CPU(s):  88-95,216-223
NUMA node12 CPU(s):  96-103,224-231
NUMA node13 CPU(s):  104-111,232-239
NUMA node14 CPU(s):  112-119,240-247
NUMA node15 CPU(s):  120-127,248-255
..

$ numactl -H
..
node distances:
node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
  0:  10  11  11  11  12  12  12  12  32  32  32  32  32  32  32  32
  1:  11  10  11  11  12  12  12  12  32  32  32  32  32  32  32  32
  2:  11  11  10  11  12  12  12  12  32  32  32  32  32  32  32  32
  3:  11  11  11  10  12  12  12  12  32  32  32  32  32  32  32  32
  4:  12  12  12  12  10  11  11  11  32  32  32  32  32  32  32  32
  5:  12  12  12  12  11  10  11  11  32  32  32  32  32  32  32  32
  6:  12  12  12  12  11  11  10  11  32  32  32  32  32  32  32  32
  7:  12  12  12  12  11  11  11  10  32  32  32  32  32  32  32  32
  8:  32  32  32  32  32  32  32  32  10  11  11  11  12  12  12  12
  9:  32  32  32  32  32  32  32  32  11  10  11  11  12  12  12  12
 10:  32  32  32  32  32  32  32  32  11  11  10  11  12  12  12  12
 11:  32  32  32  32  32  32  32  32  11  11  11  10  12  12  12  12
 12:  32  32  32  32  32  32  32  32  12  12  12  12  10  11  11  11
 13:  32  32  32  32  32  32  32  32  12  12  12  12  11  10  11  11
 14:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  10  11
 15:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  11  10

$ cat /sys/class/net/ens5f0/device/numa_node
14

Affinity hints (127 IRQs):
Before:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010
352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020
353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040
354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200
357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400
358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800
359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000
360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000
362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000
363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000
365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000
366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000
368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000
369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000
370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000
371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000
372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000
373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000
374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000
375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000
376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000
378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000
379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000
380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000
381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000
382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000
383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000
384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000
385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000
386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000
387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000
388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000
389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000
390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000
391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000
392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000
393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000
394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000

After:
331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000
363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000
364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000
365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000
366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000
367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000
368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000
369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000
370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000
371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000
372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000
373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000
374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000
375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000
376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000
377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000
378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000
379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000
380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000
381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000
382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000
383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000
428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000
429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000
430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000
431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000
432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000
433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000
434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000
435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000
436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000
437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000
438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000
439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000
440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000
441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000
442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000
443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000
444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000
445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000
446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000
447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000
448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000
449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000
450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000
451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000
452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000
453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000
454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000
455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000
456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000
457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000

Reviewed-by: Gal Pressman <gal@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 62 +++++++++++++++++++-
 1 file changed, 59 insertions(+), 3 deletions(-)

v2:
Separated the set_cpu operation into two functions, per Saeed's suggestion.
Added Saeed's Acked-by signature.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 229728c80233..e72bdaaad84f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -11,6 +11,9 @@
 #ifdef CONFIG_RFS_ACCEL
 #include <linux/cpu_rmap.h>
 #endif
+#ifdef CONFIG_NUMA
+#include <linux/sched/topology.h>
+#endif
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "fpga/core.h"
@@ -806,13 +809,67 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
 	kfree(table->comp_irqs);
 }
 
+static void set_cpus_by_local_spread(struct mlx5_core_dev *dev, u16 *cpus,
+				     int ncomp_eqs)
+{
+	int i;
+
+	for (i = 0; i < ncomp_eqs; i++)
+		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
+}
+
+static bool set_cpus_by_numa_distance(struct mlx5_core_dev *dev, u16 *cpus,
+				      int ncomp_eqs)
+{
+#ifdef CONFIG_NUMA
+	cpumask_var_t cpumask;
+	int first;
+	int i;
+
+	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL)) {
+		mlx5_core_err(dev, "zalloc_cpumask_var failed\n");
+		return false;
+	}
+	cpumask_copy(cpumask, cpu_online_mask);
+
+	first = cpumask_local_spread(0, dev->priv.numa_node);
+
+	for (i = 0; i < ncomp_eqs; i++) {
+		int cpu;
+
+		cpu = sched_numa_find_closest(cpumask, first);
+		if (cpu >= nr_cpu_ids) {
+			mlx5_core_err(dev, "sched_numa_find_closest failed, cpu(%d) >= nr_cpu_ids(%d)\n",
+				      cpu, nr_cpu_ids);
+
+			free_cpumask_var(cpumask);
+			return false;
+		}
+		cpus[i] = cpu;
+		cpumask_clear_cpu(cpu, cpumask);
+	}
+
+	free_cpumask_var(cpumask);
+	return true;
+#else
+	return false;
+#endif
+}
+
+static void mlx5_set_eqs_cpus(struct mlx5_core_dev *dev, u16 *cpus, int ncomp_eqs)
+{
+	bool success = set_cpus_by_numa_distance(dev, cpus, ncomp_eqs);
+
+	if (!success)
+		set_cpus_by_local_spread(dev, cpus, ncomp_eqs);
+}
+
 static int comp_irqs_request(struct mlx5_core_dev *dev)
 {
 	struct mlx5_eq_table *table = dev->priv.eq_table;
 	int ncomp_eqs = table->num_comp_eqs;
 	u16 *cpus;
 	int ret;
-	int i;
 
 	ncomp_eqs = table->num_comp_eqs;
 	table->comp_irqs = kcalloc(ncomp_eqs, sizeof(*table->comp_irqs), GFP_KERNEL);
@@ -830,8 +887,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
 		ret = -ENOMEM;
 		goto free_irqs;
 	}
-	for (i = 0; i < ncomp_eqs; i++)
-		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
+	mlx5_set_eqs_cpus(dev, cpus, ncomp_eqs);
 	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
 	kfree(cpus);
 	if (ret < 0)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest
  2022-07-18 12:43 ` [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest Tariq Toukan
@ 2022-07-18 13:47   ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2022-07-18 13:47 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Juri Lelli, Eric Dumazet, Paolo Abeni, netdev, Gal Pressman,
	Vincent Guittot, linux-kernel

On Mon, Jul 18, 2022 at 03:43:14PM +0300, Tariq Toukan wrote:
> This logic can help device drivers prefer some remote cpus
> over others, according to the NUMA distance metrics.
> 
> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  include/linux/sched/topology.h | 2 ++
>  kernel/sched/topology.c        | 1 +
>  2 files changed, 3 insertions(+)
> 
> v2:
> Replaced EXPORT_SYMBOL with EXPORT_SYMBOL_GPL, per Peter's comment.
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 56cffe42abbc..d467c30bdbb9 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -61,6 +61,8 @@ static inline int cpu_numa_flags(void)
>  {
>  	return SD_NUMA;
>  }
> +
> +int sched_numa_find_closest(const struct cpumask *cpus, int cpu);
>  #endif
>  
>  extern int arch_asym_cpu_priority(int cpu);
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05b6c2ad90b9..274fb2bd3849 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2066,6 +2066,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
>  
>  	return found;
>  }
> +EXPORT_SYMBOL_GPL(sched_numa_find_closest);
>  
>  #endif /* CONFIG_NUMA */
>  
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2022-07-18 12:43 ` [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Tariq Toukan
@ 2022-07-18 13:50   ` Peter Zijlstra
  2022-07-18 19:49     ` Tariq Toukan
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2022-07-18 13:50 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Juri Lelli, Eric Dumazet, Paolo Abeni, netdev, Gal Pressman,
	Vincent Guittot, linux-kernel

On Mon, Jul 18, 2022 at 03:43:15PM +0300, Tariq Toukan wrote:

> Reviewed-by: Gal Pressman <gal@nvidia.com>
> Acked-by: Saeed Mahameed <saeedm@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c | 62 +++++++++++++++++++-
>  1 file changed, 59 insertions(+), 3 deletions(-)
> 
> v2:
> Separated the set_cpu operation into two functions, per Saeed's suggestion.
> Added Saeed's Acked-by signature.
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 229728c80233..e72bdaaad84f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -11,6 +11,9 @@
>  #ifdef CONFIG_RFS_ACCEL
>  #include <linux/cpu_rmap.h>
>  #endif
> +#ifdef CONFIG_NUMA
> +#include <linux/sched/topology.h>
> +#endif
>  #include "mlx5_core.h"
>  #include "lib/eq.h"
>  #include "fpga/core.h"
> @@ -806,13 +809,67 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>  	kfree(table->comp_irqs);
>  }
>  
> +static void set_cpus_by_local_spread(struct mlx5_core_dev *dev, u16 *cpus,
> +				     int ncomp_eqs)
> +{
> +	int i;
> +
> +	for (i = 0; i < ncomp_eqs; i++)
> +		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +}
> +
> +static bool set_cpus_by_numa_distance(struct mlx5_core_dev *dev, u16 *cpus,
> +				      int ncomp_eqs)
> +{
> +#ifdef CONFIG_NUMA
> +	cpumask_var_t cpumask;
> +	int first;
> +	int i;
> +
> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL)) {
> +		mlx5_core_err(dev, "zalloc_cpumask_var failed\n");
> +		return false;
> +	}
> +	cpumask_copy(cpumask, cpu_online_mask);
> +
> +	first = cpumask_local_spread(0, dev->priv.numa_node);

Arguably you want something like:

	first = cpumask_any(cpumask_of_node(dev->priv.numa_node));

> +
> +	for (i = 0; i < ncomp_eqs; i++) {
> +		int cpu;
> +
> +		cpu = sched_numa_find_closest(cpumask, first);
> +		if (cpu >= nr_cpu_ids) {
> +			mlx5_core_err(dev, "sched_numa_find_closest failed, cpu(%d) >= nr_cpu_ids(%d)\n",
> +				      cpu, nr_cpu_ids);
> +
> +			free_cpumask_var(cpumask);
> +			return false;

So this will fail when ncomp_eqs > cpumask_weight(online_cpus); is that
desired?

> +		}
> +		cpus[i] = cpu;
> +		cpumask_clear_cpu(cpu, cpumask);

Since there is no concurrency on this cpumask, you don't need atomic
ops:

		__cpumask_clear_cpu(..);

> +	}
> +
> +	free_cpumask_var(cpumask);
> +	return true;
> +#else
> +	return false;
> +#endif
> +}
> +
> +static void mlx5_set_eqs_cpus(struct mlx5_core_dev *dev, u16 *cpus, int ncomp_eqs)
> +{
> +	bool success = set_cpus_by_numa_distance(dev, cpus, ncomp_eqs);
> +
> +	if (!success)
> +		set_cpus_by_local_spread(dev, cpus, ncomp_eqs);
> +}
> +
>  static int comp_irqs_request(struct mlx5_core_dev *dev)
>  {
>  	struct mlx5_eq_table *table = dev->priv.eq_table;
>  	int ncomp_eqs = table->num_comp_eqs;
>  	u16 *cpus;
>  	int ret;
> -	int i;
>  
>  	ncomp_eqs = table->num_comp_eqs;
>  	table->comp_irqs = kcalloc(ncomp_eqs, sizeof(*table->comp_irqs), GFP_KERNEL);
> @@ -830,8 +887,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>  		ret = -ENOMEM;
>  		goto free_irqs;
>  	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +	mlx5_set_eqs_cpus(dev, cpus, ncomp_eqs);

So you change this for mlx5, what about the other users of
cpumask_local_spread() ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2022-07-18 13:50   ` Peter Zijlstra
@ 2022-07-18 19:49     ` Tariq Toukan
  2022-07-18 21:24       ` Peter Zijlstra
  2022-07-18 21:57       ` Jakub Kicinski
  0 siblings, 2 replies; 8+ messages in thread
From: Tariq Toukan @ 2022-07-18 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Tariq Toukan
  Cc: David S. Miller, Saeed Mahameed, Jakub Kicinski, Ingo Molnar,
	Juri Lelli, Eric Dumazet, Paolo Abeni, netdev, Gal Pressman,
	Vincent Guittot, linux-kernel



On 7/18/2022 4:50 PM, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 03:43:15PM +0300, Tariq Toukan wrote:
> 
>> Reviewed-by: Gal Pressman <gal@nvidia.com>
>> Acked-by: Saeed Mahameed <saeedm@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 62 +++++++++++++++++++-
>>   1 file changed, 59 insertions(+), 3 deletions(-)
>>
>> v2:
>> Separated the set_cpu operation into two functions, per Saeed's suggestion.
>> Added Saeed's Acked-by signature.
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> index 229728c80233..e72bdaaad84f 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> @@ -11,6 +11,9 @@
>>   #ifdef CONFIG_RFS_ACCEL
>>   #include <linux/cpu_rmap.h>
>>   #endif
>> +#ifdef CONFIG_NUMA
>> +#include <linux/sched/topology.h>
>> +#endif
>>   #include "mlx5_core.h"
>>   #include "lib/eq.h"
>>   #include "fpga/core.h"
>> @@ -806,13 +809,67 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>>   	kfree(table->comp_irqs);
>>   }
>>   
>> +static void set_cpus_by_local_spread(struct mlx5_core_dev *dev, u16 *cpus,
>> +				     int ncomp_eqs)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ncomp_eqs; i++)
>> +		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
>> +}
>> +
>> +static bool set_cpus_by_numa_distance(struct mlx5_core_dev *dev, u16 *cpus,
>> +				      int ncomp_eqs)
>> +{
>> +#ifdef CONFIG_NUMA
>> +	cpumask_var_t cpumask;
>> +	int first;
>> +	int i;
>> +
>> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL)) {
>> +		mlx5_core_err(dev, "zalloc_cpumask_var failed\n");
>> +		return false;
>> +	}
>> +	cpumask_copy(cpumask, cpu_online_mask);
>> +
>> +	first = cpumask_local_spread(0, dev->priv.numa_node);
> 
> Arguably you want something like:
> 
> 	first = cpumask_any(cpumask_of_node(dev->priv.numa_node));

Any doesn't sound like what I'm looking for, I'm looking for first.
I do care about the order within the node, so it's more like 
cpumask_first(cpumask_of_node(dev->priv.numa_node));

Do you think this has any advantage over cpumask_local_spread, if used 
only during the setup phase of the driver?

> 
>> +
>> +	for (i = 0; i < ncomp_eqs; i++) {
>> +		int cpu;
>> +
>> +		cpu = sched_numa_find_closest(cpumask, first);
>> +		if (cpu >= nr_cpu_ids) {
>> +			mlx5_core_err(dev, "sched_numa_find_closest failed, cpu(%d) >= nr_cpu_ids(%d)\n",
>> +				      cpu, nr_cpu_ids);
>> +
>> +			free_cpumask_var(cpumask);
>> +			return false;
> 
> So this will fail when ncomp_eqs > cpumask_weight(online_cpus); is that
> desired?
> 

Yes. ncomp_eqs does not exceed the num of online cores.


>> +		}
>> +		cpus[i] = cpu;
>> +		cpumask_clear_cpu(cpu, cpumask);
> 
> Since there is no concurrency on this cpumask, you don't need atomic
> ops:
> 
> 		__cpumask_clear_cpu(..);
> 

Right. I'll fix.

>> +	}
>> +
>> +	free_cpumask_var(cpumask);
>> +	return true;
>> +#else
>> +	return false;
>> +#endif
>> +}
>> +
>> +static void mlx5_set_eqs_cpus(struct mlx5_core_dev *dev, u16 *cpus, int ncomp_eqs)
>> +{
>> +	bool success = set_cpus_by_numa_distance(dev, cpus, ncomp_eqs);
>> +
>> +	if (!success)
>> +		set_cpus_by_local_spread(dev, cpus, ncomp_eqs);
>> +}
>> +
>>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>>   {
>>   	struct mlx5_eq_table *table = dev->priv.eq_table;
>>   	int ncomp_eqs = table->num_comp_eqs;
>>   	u16 *cpus;
>>   	int ret;
>> -	int i;
>>   
>>   	ncomp_eqs = table->num_comp_eqs;
>>   	table->comp_irqs = kcalloc(ncomp_eqs, sizeof(*table->comp_irqs), GFP_KERNEL);
>> @@ -830,8 +887,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>>   		ret = -ENOMEM;
>>   		goto free_irqs;
>>   	}
>> -	for (i = 0; i < ncomp_eqs; i++)
>> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
>> +	mlx5_set_eqs_cpus(dev, cpus, ncomp_eqs);
> 
> So you change this for mlx5, what about the other users of
> cpumask_local_spread() ?

I took a look at the different netdev users.
While some users have similar use case to ours (affinity hints), many 
others use cpumask_local_spread in other flows (XPS setting, ring 
allocations, etc..).

Moving them to use the newly exposed API needs some deeper dive into 
their code, especially due to the possible undesired side-effects.

I prefer not to include these changes in my series for now, but probably 
contribute it in a followup work.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2022-07-18 19:49     ` Tariq Toukan
@ 2022-07-18 21:24       ` Peter Zijlstra
  2022-07-18 21:57       ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2022-07-18 21:24 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Tariq Toukan, David S. Miller, Saeed Mahameed, Jakub Kicinski,
	Ingo Molnar, Juri Lelli, Eric Dumazet, Paolo Abeni, netdev,
	Gal Pressman, Vincent Guittot, linux-kernel

On Mon, Jul 18, 2022 at 10:49:21PM +0300, Tariq Toukan wrote:

> > > +	first = cpumask_local_spread(0, dev->priv.numa_node);
> > 
> > Arguably you want something like:
> > 
> > 	first = cpumask_any(cpumask_of_node(dev->priv.numa_node));
> 
> Any doesn't sound like what I'm looking for, I'm looking for first.
> I do care about the order within the node, so it's more like
> cpumask_first(cpumask_of_node(dev->priv.numa_node));
> 
> Do you think this has any advantage over cpumask_local_spread, if used only
> during the setup phase of the driver?

Only for the poor sod trying to read this code ;-) That is, I had no
idea what cpumask_local_spread() does, while cpumask_first() is fairly
obvious.

> > > @@ -830,8 +887,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
> > >   		ret = -ENOMEM;
> > >   		goto free_irqs;
> > >   	}
> > > -	for (i = 0; i < ncomp_eqs; i++)
> > > -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> > > +	mlx5_set_eqs_cpus(dev, cpus, ncomp_eqs);
> > 
> > So you change this for mlx5, what about the other users of
> > cpumask_local_spread() ?
> 
> I took a look at the different netdev users.
> While some users have similar use case to ours (affinity hints), many others
> use cpumask_local_spread in other flows (XPS setting, ring allocations,
> etc..).
> 
> Moving them to use the newly exposed API needs some deeper dive into their
> code, especially due to the possible undesired side-effects.
> 
> I prefer not to include these changes in my series for now, but probably
> contribute it in a followup work.

Fair enough.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints
  2022-07-18 19:49     ` Tariq Toukan
  2022-07-18 21:24       ` Peter Zijlstra
@ 2022-07-18 21:57       ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2022-07-18 21:57 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Peter Zijlstra, Tariq Toukan, David S. Miller, Saeed Mahameed,
	Ingo Molnar, Juri Lelli, Eric Dumazet, Paolo Abeni, netdev,
	Gal Pressman, Vincent Guittot, linux-kernel

On Mon, 18 Jul 2022 22:49:21 +0300 Tariq Toukan wrote:
> >> @@ -830,8 +887,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
> >>   		ret = -ENOMEM;
> >>   		goto free_irqs;
> >>   	}
> >> -	for (i = 0; i < ncomp_eqs; i++)
> >> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> >> +	mlx5_set_eqs_cpus(dev, cpus, ncomp_eqs);  
> > 
> > So you change this for mlx5, what about the other users of
> > cpumask_local_spread() ?  
> 
> I took a look at the different netdev users.
> While some users have similar use case to ours (affinity hints), many 
> others use cpumask_local_spread in other flows (XPS setting, ring 
> allocations, etc..).
> 
> Moving them to use the newly exposed API needs some deeper dive into 
> their code, especially due to the possible undesired side-effects.
> 
> I prefer not to include these changes in my series for now, but probably 
> contribute it in a followup work.

I'd be great if you could pick any other driver and start creating 
the right APIs for it and mlx5. "Probably contribute followup work"
does not inspire confidence. And yes, I am being picky, I'm holding 
a grudge against mlx5 for not using netif_get_num_default_rss_queues().

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-18 21:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18 12:43 [PATCH net-next V2 0/2] mlx5: Use NUMA distance metrics Tariq Toukan
2022-07-18 12:43 ` [PATCH net-next V2 1/2] sched/topology: Expose sched_numa_find_closest Tariq Toukan
2022-07-18 13:47   ` Peter Zijlstra
2022-07-18 12:43 ` [PATCH net-next V2 2/2] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Tariq Toukan
2022-07-18 13:50   ` Peter Zijlstra
2022-07-18 19:49     ` Tariq Toukan
2022-07-18 21:24       ` Peter Zijlstra
2022-07-18 21:57       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).