From: Vincent Guittot <vincent.guittot@linaro.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Valentin Schneider <valentin.schneider@arm.com>,
Juri Lelli <juri.lelli@redhat.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] sched: Limit the amount of NUMA imbalance that can exist at fork time
Date: Fri, 20 Nov 2020 14:33:25 +0100 [thread overview]
Message-ID: <CAKfTPtDGDEHpOFC29+o-K434CB4jBT6JON07DR5Hhar7wPyybw@mail.gmail.com> (raw)
In-Reply-To: <20201120090630.3286-5-mgorman@techsingularity.net>
On Fri, 20 Nov 2020 at 10:06, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> At fork time currently, a local node can be allowed to fill completely
> and allow the periodic load balancer to fix the problem. This can be
> problematic in cases where a task creates lots of threads that idle until
> woken as part of a worker poll causing a memory bandwidth problem.
>
> However, a "real" workload suffers badly from this behaviour. The workload
> in question is mostly NUMA aware but spawns large numbers of threads
> that act as a worker pool that can be called from anywhere. These need
> to spread early to get reasonable behaviour.
>
> This patch limits how much a local node can fill before spilling over
> to another node and it will not be a universal win. Specifically,
> very short-lived workloads that fit within a NUMA node would prefer
> the memory bandwidth.
>
> As I cannot describe the "real" workload, the best proxy measure I found
> for illustration was a page fault microbenchmark. It's not representative
> of the workload but demonstrates the hazard of the current behaviour.
>
> pft timings
> 5.10.0-rc2 5.10.0-rc2
> imbalancefloat-v2 forkspread-v2
> Amean elapsed-1 46.37 ( 0.00%) 46.05 * 0.69%*
> Amean elapsed-4 12.43 ( 0.00%) 12.49 * -0.47%*
> Amean elapsed-7 7.61 ( 0.00%) 7.55 * 0.81%*
> Amean elapsed-12 4.79 ( 0.00%) 4.80 ( -0.17%)
> Amean elapsed-21 3.13 ( 0.00%) 2.89 * 7.74%*
> Amean elapsed-30 3.65 ( 0.00%) 2.27 * 37.62%*
> Amean elapsed-48 3.08 ( 0.00%) 2.13 * 30.69%*
> Amean elapsed-79 2.00 ( 0.00%) 1.90 * 4.95%*
> Amean elapsed-80 2.00 ( 0.00%) 1.90 * 4.70%*
>
> This is showing the time to fault regions belonging to threads. The target
> machine has 80 logical CPUs and two nodes. Note the ~30% gain when the
> machine is approximately the point where one node becomes fully utilised.
> The slower results are borderline noise.
>
> Kernel building shows similar benefits around the same balance point.
> Generally performance was either neutral or better in the tests conducted.
> The main consideration with this patch is the point where fork stops
> spreading a task so some workloads may benefit from different balance
> points but it would be a risky tuning parameter.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e17e6c5da1d5..6d1c24708664 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8761,6 +8761,16 @@ static bool update_pick_idlest(struct sched_group *idlest,
> return true;
> }
>
> +/*
> + * Allow a NUMA imbalance if busy CPUs is less than 25% of the domain.
> + * This is an approximation as the number of running tasks may not be
> + * related to the number of busy CPUs due to sched_setaffinity.
> + */
> +static inline bool allow_numa_imbalance(int dst_running, int dst_weight)
> +{
> + return (dst_running < (dst_weight >> 2));
> +}
> +
> /*
> * find_idlest_group() finds and returns the least busy CPU group within the
> * domain.
> @@ -8893,7 +8903,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
> * a real need of migration, periodic load balance will
> * take care of it.
> */
> - if (local_sgs.idle_cpus)
> + if (allow_numa_imbalance(local_sgs.sum_nr_running, sd->span_weight))
> return NULL;
> }
>
> @@ -9000,11 +9010,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> static inline long adjust_numa_imbalance(int imbalance,
> int dst_running, int dst_weight)
> {
> + if (!allow_numa_imbalance(dst_running, dst_weight))
> + return imbalance;
> +
> /*
> * Allow a small imbalance based on a simple pair of communicating
> * tasks that remain local when the destination is lightly loaded.
> */
> - if (dst_running < (dst_weight >> 2) && imbalance <= NUMA_IMBALANCE_MIN)
> + if (imbalance <= NUMA_IMBALANCE_MIN)
> return 0;
>
> return imbalance;
> --
> 2.26.2
>
next prev parent reply other threads:[~2020-11-20 13:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-20 9:06 [PATCH v3 0/4] Revisit NUMA imbalance tolerance and fork balancing Mel Gorman
2020-11-20 9:06 ` [PATCH 1/4] sched/numa: Rename nr_running and break out the magic number Mel Gorman
2020-11-20 13:32 ` Vincent Guittot
2020-11-25 14:02 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-11-20 9:06 ` [PATCH 2/4] sched: Avoid unnecessary calculation of load imbalance at clone time Mel Gorman
2020-11-20 13:32 ` Vincent Guittot
2020-11-25 14:02 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-11-20 9:06 ` [PATCH 3/4] sched/numa: Allow a floating imbalance between NUMA nodes Mel Gorman
2020-11-20 13:33 ` Vincent Guittot
2020-11-25 14:02 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-11-20 9:06 ` [PATCH 4/4] sched: Limit the amount of NUMA imbalance that can exist at fork time Mel Gorman
2020-11-20 13:33 ` Vincent Guittot [this message]
2020-11-25 14:02 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-11-20 12:58 ` [PATCH v3 0/4] Revisit NUMA imbalance tolerance and fork balancing Peter Zijlstra
2020-11-20 14:02 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2020-11-19 8:30 [PATCH v2 " Mel Gorman
2020-11-19 8:30 ` [PATCH 4/4] sched: Limit the amount of NUMA imbalance that can exist at fork time Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKfTPtDGDEHpOFC29+o-K434CB4jBT6JON07DR5Hhar7wPyybw@mail.gmail.com \
--to=vincent.guittot@linaro.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=valentin.schneider@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).