From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org> Cc: LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net>, Rik van Riel <riel@surriel.com>, Srikar Dronamraju <srikar@linux.vnet.ibm.com>, Thomas Gleixner <tglx@linutronix.de>, Valentin Schneider <valentin.schneider@arm.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, linuxppc-dev@lists.ozlabs.org, Nathan Lynch <nathanl@linux.ibm.com>, Michael Ellerman <mpe@ellerman.id.au>, Scott Cheloha <cheloha@linux.ibm.com>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Subject: [PATCH 1/3] sched/topology: Allow archs to populate distance map Date: Thu, 20 May 2021 21:14:25 +0530 [thread overview] Message-ID: <20210520154427.1041031-2-srikar@linux.vnet.ibm.com> (raw) In-Reply-To: <20210520154427.1041031-1-srikar@linux.vnet.ibm.com> Currently scheduler populates the distance map by looking at distance of each node from all other nodes. This should work for most architectures and platforms. However there are some architectures like POWER that may not expose the distance of nodes that are not yet onlined because those resources are not yet allocated to the OS instance. Such architectures have other means to provide valid distance data for the current platform. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then the numa topology will look like node distances: node 0 1 0: 10 20 1: 20 10 It may be implementation dependent on what node_distance(0,3) where node 0 is online and node 3 is offline. In POWER case, it returns LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max distance between nodes is 20. However that would not be true. When Nodes are onlined and CPUs from those nodes are hotplugged, the max node distance would be 40. To handle such scenarios, let scheduler allow architectures to populate the distance map. Architectures that like to populate the distance map can overload arch_populate_distance_map(). Cc: LKML <linux-kernel@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Rik van Riel <riel@surriel.com> Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> --- kernel/sched/topology.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 053115b55f89..ccb9aff59add 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1630,6 +1630,26 @@ static void init_numa_topology_type(void) #define NR_DISTANCE_VALUES (1 << DISTANCE_BITS) +#ifndef arch_populate_distance_map +static int arch_populate_distance_map(unsigned long *distance_map) +{ + int i, j; + + for (i = 0; i < nr_node_ids; i++) { + for (j = 0; j < nr_node_ids; j++) { + int distance = node_distance(i, j); + + if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) { + sched_numa_warn("Invalid distance value range"); + return -1; + } + bitmap_set(distance_map, distance, 1); + } + } + return 0; +} +#endif + void sched_init_numa(void) { struct sched_domain_topology_level *tl; @@ -1646,18 +1666,10 @@ void sched_init_numa(void) return; bitmap_zero(distance_map, NR_DISTANCE_VALUES); - for (i = 0; i < nr_node_ids; i++) { - for (j = 0; j < nr_node_ids; j++) { - int distance = node_distance(i, j); - if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) { - sched_numa_warn("Invalid distance value range"); - return; - } + if (arch_populate_distance_map(distance_map)) + return; - bitmap_set(distance_map, distance, 1); - } - } /* * We can now figure out how many unique distance values there are and * allocate memory accordingly. -- 2.27.0
WARNING: multiple messages have this Message-ID (diff)
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org> Cc: Nathan Lynch <nathanl@linux.ibm.com>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Vincent Guittot <vincent.guittot@linaro.org>, Srikar Dronamraju <srikar@linux.vnet.ibm.com>, Rik van Riel <riel@surriel.com>, linuxppc-dev@lists.ozlabs.org, Scott Cheloha <cheloha@linux.ibm.com>, Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>, LKML <linux-kernel@vger.kernel.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Thomas Gleixner <tglx@linutronix.de>, Mel Gorman <mgorman@techsingularity.net>, Valentin Schneider <valentin.schneider@arm.com> Subject: [PATCH 1/3] sched/topology: Allow archs to populate distance map Date: Thu, 20 May 2021 21:14:25 +0530 [thread overview] Message-ID: <20210520154427.1041031-2-srikar@linux.vnet.ibm.com> (raw) In-Reply-To: <20210520154427.1041031-1-srikar@linux.vnet.ibm.com> Currently scheduler populates the distance map by looking at distance of each node from all other nodes. This should work for most architectures and platforms. However there are some architectures like POWER that may not expose the distance of nodes that are not yet onlined because those resources are not yet allocated to the OS instance. Such architectures have other means to provide valid distance data for the current platform. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then the numa topology will look like node distances: node 0 1 0: 10 20 1: 20 10 It may be implementation dependent on what node_distance(0,3) where node 0 is online and node 3 is offline. In POWER case, it returns LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max distance between nodes is 20. However that would not be true. When Nodes are onlined and CPUs from those nodes are hotplugged, the max node distance would be 40. To handle such scenarios, let scheduler allow architectures to populate the distance map. Architectures that like to populate the distance map can overload arch_populate_distance_map(). Cc: LKML <linux-kernel@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Rik van Riel <riel@surriel.com> Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> --- kernel/sched/topology.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 053115b55f89..ccb9aff59add 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1630,6 +1630,26 @@ static void init_numa_topology_type(void) #define NR_DISTANCE_VALUES (1 << DISTANCE_BITS) +#ifndef arch_populate_distance_map +static int arch_populate_distance_map(unsigned long *distance_map) +{ + int i, j; + + for (i = 0; i < nr_node_ids; i++) { + for (j = 0; j < nr_node_ids; j++) { + int distance = node_distance(i, j); + + if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) { + sched_numa_warn("Invalid distance value range"); + return -1; + } + bitmap_set(distance_map, distance, 1); + } + } + return 0; +} +#endif + void sched_init_numa(void) { struct sched_domain_topology_level *tl; @@ -1646,18 +1666,10 @@ void sched_init_numa(void) return; bitmap_zero(distance_map, NR_DISTANCE_VALUES); - for (i = 0; i < nr_node_ids; i++) { - for (j = 0; j < nr_node_ids; j++) { - int distance = node_distance(i, j); - if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) { - sched_numa_warn("Invalid distance value range"); - return; - } + if (arch_populate_distance_map(distance_map)) + return; - bitmap_set(distance_map, distance, 1); - } - } /* * We can now figure out how many unique distance values there are and * allocate memory accordingly. -- 2.27.0
next prev parent reply other threads:[~2021-05-20 15:45 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-20 15:44 [PATCH 0/3] Skip numa distance for offline nodes Srikar Dronamraju 2021-05-20 15:44 ` Srikar Dronamraju 2021-05-20 15:44 ` Srikar Dronamraju [this message] 2021-05-20 15:44 ` [PATCH 1/3] sched/topology: Allow archs to populate distance map Srikar Dronamraju 2021-05-20 18:56 ` Peter Zijlstra 2021-05-20 18:56 ` Peter Zijlstra 2021-05-21 2:38 ` Srikar Dronamraju 2021-05-21 2:38 ` Srikar Dronamraju 2021-05-21 8:14 ` Peter Zijlstra 2021-05-21 8:14 ` Peter Zijlstra 2021-05-21 9:28 ` Srikar Dronamraju 2021-05-21 9:28 ` Srikar Dronamraju 2021-05-24 14:16 ` Valentin Schneider 2021-05-24 14:16 ` Valentin Schneider 2021-05-24 16:18 ` Srikar Dronamraju 2021-05-24 16:18 ` Srikar Dronamraju 2021-05-25 10:21 ` Valentin Schneider 2021-05-25 10:21 ` Valentin Schneider 2021-05-25 11:32 ` Srikar Dronamraju 2021-05-25 11:32 ` Srikar Dronamraju 2021-05-28 5:21 ` Srikar Dronamraju 2021-05-28 5:21 ` Srikar Dronamraju 2021-05-28 8:43 ` Peter Zijlstra 2021-05-28 8:43 ` Peter Zijlstra 2021-05-28 10:24 ` Srikar Dronamraju 2021-05-28 10:24 ` Srikar Dronamraju 2021-05-20 15:44 ` [PATCH 2/3] powerpc/numa: Populate distance map correctly Srikar Dronamraju 2021-05-20 15:44 ` Srikar Dronamraju 2021-05-24 14:16 ` Valentin Schneider 2021-05-24 14:16 ` Valentin Schneider 2021-05-24 14:50 ` Srikar Dronamraju 2021-05-24 14:50 ` Srikar Dronamraju 2021-05-20 15:44 ` [PATCH 3/3] sched/topology: Skip updating masks for non-online nodes Srikar Dronamraju 2021-05-20 15:44 ` Srikar Dronamraju
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210520154427.1041031-2-srikar@linux.vnet.ibm.com \ --to=srikar@linux.vnet.ibm.com \ --cc=Geetika.Moolchandani1@ibm.com \ --cc=cheloha@linux.ibm.com \ --cc=dietmar.eggemann@arm.com \ --cc=ego@linux.vnet.ibm.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mgorman@techsingularity.net \ --cc=mingo@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=nathanl@linux.ibm.com \ --cc=peterz@infradead.org \ --cc=riel@surriel.com \ --cc=tglx@linutronix.de \ --cc=valentin.schneider@arm.com \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.