stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Valentin Schneider <valentin.schneider@arm.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	dann frazier <dann.frazier@canonical.com>
Subject: [PATCH 4.19 28/30] sched/topology: Make sched_init_numa() use a set for the deduplicating sort
Date: Mon, 14 Mar 2022 12:34:46 +0100	[thread overview]
Message-ID: <20220314112732.587290871@linuxfoundation.org> (raw)
In-Reply-To: <20220314112731.785042288@linuxfoundation.org>

From: Valentin Schneider <valentin.schneider@arm.com>

commit 620a6dc40754dc218f5b6389b5d335e9a107fd29 upstream.

The deduplicating sort in sched_init_numa() assumes that the first line in
the distance table contains all unique values in the entire table. I've
been trying to pen what this exactly means for the topology, but it's not
straightforward. For instance, topology.c uses this example:

  node   0   1   2   3
    0:  10  20  20  30
    1:  20  10  20  20
    2:  20  20  10  20
    3:  30  20  20  10

  0 ----- 1
  |     / |
  |   /   |
  | /     |
  2 ----- 3

Which works out just fine. However, if we swap nodes 0 and 1:

  1 ----- 0
  |     / |
  |   /   |
  | /     |
  2 ----- 3

we get this distance table:

  node   0  1  2  3
    0:  10 20 20 20
    1:  20 10 20 30
    2:  20 20 10 20
    3:  20 30 20 10

Which breaks the deduplicating sort (non-representative first line). In
this case this would just be a renumbering exercise, but it so happens that
we can have a deduplicating sort that goes through the whole table in O(n²)
at the extra cost of a temporary memory allocation (i.e. any form of set).

The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
this, implement the set as a 256-bits bitmap. Should this not be
satisfactory (i.e. we want to support 32-bit values), then we'll have to go
for some other sparse set implementation.

This has the added benefit of letting us allocate just the right amount of
memory for sched_domains_numa_distance[], rather than an arbitrary
(nr_node_ids + 1).

Note: DT binding equivalent (distance-map) decodes distances as 32-bit
values.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/topology.h |    1 
 kernel/sched/topology.c  |   99 ++++++++++++++++++++++-------------------------
 2 files changed, 49 insertions(+), 51 deletions(-)

--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -47,6 +47,7 @@ int arch_update_cpu_topology(void);
 /* Conform to ACPI 2.0 SLIT distance definitions */
 #define LOCAL_DISTANCE		10
 #define REMOTE_DISTANCE		20
+#define DISTANCE_BITS           8
 #ifndef node_distance
 #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1322,66 +1322,58 @@ static void init_numa_topology_type(void
 	}
 }
 
+
+#define NR_DISTANCE_VALUES (1 << DISTANCE_BITS)
+
 void sched_init_numa(void)
 {
-	int next_distance, curr_distance = node_distance(0, 0);
 	struct sched_domain_topology_level *tl;
-	int level = 0;
-	int i, j, k;
-
-	sched_domains_numa_distance = kzalloc(sizeof(int) * (nr_node_ids + 1), GFP_KERNEL);
-	if (!sched_domains_numa_distance)
-		return;
-
-	/* Includes NUMA identity node at level 0. */
-	sched_domains_numa_distance[level++] = curr_distance;
-	sched_domains_numa_levels = level;
+	unsigned long *distance_map;
+	int nr_levels = 0;
+	int i, j;
 
 	/*
 	 * O(nr_nodes^2) deduplicating selection sort -- in order to find the
 	 * unique distances in the node_distance() table.
-	 *
-	 * Assumes node_distance(0,j) includes all distances in
-	 * node_distance(i,j) in order to avoid cubic time.
 	 */
-	next_distance = curr_distance;
+	distance_map = bitmap_alloc(NR_DISTANCE_VALUES, GFP_KERNEL);
+	if (!distance_map)
+		return;
+
+	bitmap_zero(distance_map, NR_DISTANCE_VALUES);
 	for (i = 0; i < nr_node_ids; i++) {
 		for (j = 0; j < nr_node_ids; j++) {
-			for (k = 0; k < nr_node_ids; k++) {
-				int distance = node_distance(i, k);
+			int distance = node_distance(i, j);
 
-				if (distance > curr_distance &&
-				    (distance < next_distance ||
-				     next_distance == curr_distance))
-					next_distance = distance;
-
-				/*
-				 * While not a strong assumption it would be nice to know
-				 * about cases where if node A is connected to B, B is not
-				 * equally connected to A.
-				 */
-				if (sched_debug() && node_distance(k, i) != distance)
-					sched_numa_warn("Node-distance not symmetric");
-
-				if (sched_debug() && i && !find_numa_distance(distance))
-					sched_numa_warn("Node-0 not representative");
+			if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) {
+				sched_numa_warn("Invalid distance value range");
+				return;
 			}
-			if (next_distance != curr_distance) {
-				sched_domains_numa_distance[level++] = next_distance;
-				sched_domains_numa_levels = level;
-				curr_distance = next_distance;
-			} else break;
+
+			bitmap_set(distance_map, distance, 1);
 		}
+	}
+	/*
+	 * We can now figure out how many unique distance values there are and
+	 * allocate memory accordingly.
+	 */
+	nr_levels = bitmap_weight(distance_map, NR_DISTANCE_VALUES);
 
-		/*
-		 * In case of sched_debug() we verify the above assumption.
-		 */
-		if (!sched_debug())
-			break;
+	sched_domains_numa_distance = kcalloc(nr_levels, sizeof(int), GFP_KERNEL);
+	if (!sched_domains_numa_distance) {
+		bitmap_free(distance_map);
+		return;
 	}
 
+	for (i = 0, j = 0; i < nr_levels; i++, j++) {
+		j = find_next_bit(distance_map, NR_DISTANCE_VALUES, j);
+		sched_domains_numa_distance[i] = j;
+	}
+
+	bitmap_free(distance_map);
+
 	/*
-	 * 'level' contains the number of unique distances
+	 * 'nr_levels' contains the number of unique distances
 	 *
 	 * The sched_domains_numa_distance[] array includes the actual distance
 	 * numbers.
@@ -1390,15 +1382,15 @@ void sched_init_numa(void)
 	/*
 	 * Here, we should temporarily reset sched_domains_numa_levels to 0.
 	 * If it fails to allocate memory for array sched_domains_numa_masks[][],
-	 * the array will contain less then 'level' members. This could be
+	 * the array will contain less then 'nr_levels' members. This could be
 	 * dangerous when we use it to iterate array sched_domains_numa_masks[][]
 	 * in other functions.
 	 *
-	 * We reset it to 'level' at the end of this function.
+	 * We reset it to 'nr_levels' at the end of this function.
 	 */
 	sched_domains_numa_levels = 0;
 
-	sched_domains_numa_masks = kzalloc(sizeof(void *) * level, GFP_KERNEL);
+	sched_domains_numa_masks = kzalloc(sizeof(void *) * nr_levels, GFP_KERNEL);
 	if (!sched_domains_numa_masks)
 		return;
 
@@ -1406,7 +1398,7 @@ void sched_init_numa(void)
 	 * Now for each level, construct a mask per node which contains all
 	 * CPUs of nodes that are that many hops away from us.
 	 */
-	for (i = 0; i < level; i++) {
+	for (i = 0; i < nr_levels; i++) {
 		sched_domains_numa_masks[i] =
 			kzalloc(nr_node_ids * sizeof(void *), GFP_KERNEL);
 		if (!sched_domains_numa_masks[i])
@@ -1414,12 +1406,17 @@ void sched_init_numa(void)
 
 		for (j = 0; j < nr_node_ids; j++) {
 			struct cpumask *mask = kzalloc(cpumask_size(), GFP_KERNEL);
+			int k;
+
 			if (!mask)
 				return;
 
 			sched_domains_numa_masks[i][j] = mask;
 
 			for_each_node(k) {
+				if (sched_debug() && (node_distance(j, k) != node_distance(k, j)))
+					sched_numa_warn("Node-distance not symmetric");
+
 				if (node_distance(j, k) > sched_domains_numa_distance[i])
 					continue;
 
@@ -1431,7 +1428,7 @@ void sched_init_numa(void)
 	/* Compute default topology size */
 	for (i = 0; sched_domain_topology[i].mask; i++);
 
-	tl = kzalloc((i + level + 1) *
+	tl = kzalloc((i + nr_levels) *
 			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
 	if (!tl)
 		return;
@@ -1454,7 +1451,7 @@ void sched_init_numa(void)
 	/*
 	 * .. and append 'j' levels of NUMA goodness.
 	 */
-	for (j = 1; j < level; i++, j++) {
+	for (j = 1; j < nr_levels; i++, j++) {
 		tl[i] = (struct sched_domain_topology_level){
 			.mask = sd_numa_mask,
 			.sd_flags = cpu_numa_flags,
@@ -1466,8 +1463,8 @@ void sched_init_numa(void)
 
 	sched_domain_topology = tl;
 
-	sched_domains_numa_levels = level;
-	sched_max_numa_distance = sched_domains_numa_distance[level - 1];
+	sched_domains_numa_levels = nr_levels;
+	sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1];
 
 	init_numa_topology_type();
 }



  parent reply	other threads:[~2022-03-14 11:41 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14 11:34 [PATCH 4.19 00/30] 4.19.235-rc1 review Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 01/30] net: qlogic: check the return value of dma_alloc_coherent() in qed_vf_hw_prepare() Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 02/30] qed: return status of qed_iov_get_link Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 03/30] ethernet: Fix error handling in xemaclite_of_probe Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 04/30] net: ethernet: ti: cpts: Handle error for clk_enable Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 05/30] net: ethernet: lpc_eth: " Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 06/30] ax25: Fix NULL pointer dereference in ax25_kill_by_device Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 07/30] net/mlx5: Fix size field in bufferx_reg struct Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 08/30] NFC: port100: fix use-after-free in port100_send_complete Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 09/30] gpio: ts4900: Do not set DAT and OE together Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 10/30] gianfar: ethtool: Fix refcount leak in gfar_get_ts_info Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 11/30] net: phy: DP83822: clear MISR2 register to disable interrupts Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 12/30] sctp: fix kernel-infoleak for SCTP sockets Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 13/30] net-sysfs: add check for netdevice being present to speed_show Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 14/30] Revert "xen-netback: remove hotplug-status once it has served its purpose" Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 15/30] Revert "xen-netback: Check for hotplug-status existence before watching" Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 16/30] tracing: Ensure trace buffer is at least 4096 bytes large Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 17/30] selftests/memfd: clean up mapping in mfd_fail_write Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 18/30] ARM: Spectre-BHB: provide empty stub for non-config Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 19/30] staging: gdm724x: fix use after free in gdm_lte_rx() Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 20/30] net: macb: Fix lost RX packet wakeup race in NAPI receive Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 21/30] riscv: Fix auipc+jalr relocation range checks Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 22/30] KVM: arm64: Reset PMC_EL0 to avoid a panic() on systems with no PMU Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 23/30] virtio: unexport virtio_finalize_features Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 24/30] virtio: acknowledge all features before access Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 25/30] ARM: fix Thumb2 regression with Spectre BHB Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 26/30] ext4: add check to prevent attempting to resize an fs with sparse_super2 Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 27/30] btrfs: unlock newly allocated extent buffer after error Greg Kroah-Hartman
2022-03-14 11:34 ` Greg Kroah-Hartman [this message]
2022-03-14 11:34 ` [PATCH 4.19 29/30] sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa() Greg Kroah-Hartman
2022-03-14 11:34 ` [PATCH 4.19 30/30] ia64: ensure proper NUMA distance and possible map initialization Greg Kroah-Hartman
2022-03-14 13:58 ` [PATCH 4.19 00/30] 4.19.235-rc1 review Jon Hunter
2022-03-14 14:05   ` Greg Kroah-Hartman
2022-03-14 14:14     ` Jon Hunter
2022-03-14 14:57       ` Greg Kroah-Hartman
2022-03-15 12:14         ` James Morse
2022-03-15 12:28           ` Greg Kroah-Hartman
2022-03-14 14:32     ` Naresh Kamboju
2022-03-14 15:00       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220314112732.587290871@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dann.frazier@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).