From: Jirka Hladky <jhladky@redhat.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Phil Auld <pauld@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Valentin Schneider <valentin.schneider@arm.com>,
LKML <linux-kernel@vger.kernel.org>,
Douglas Shakshober <dshaks@redhat.com>,
Waiman Long <longman@redhat.com>, Joe Mario <jmario@redhat.com>,
Bill Gray <bgray@redhat.com>,
"aokuliar@redhat.com" <aokuliar@redhat.com>,
"kkolakow@redhat.com" <kkolakow@redhat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6
Date: Wed, 20 May 2020 18:01:19 +0200 [thread overview]
Message-ID: <CAE4VaGBsjVYc0kOXjm8OgRQgg73rUcyovMAiqcTO7VhbOhxUFw@mail.gmail.com> (raw)
In-Reply-To: <CAE4VaGAxqK_gr7gstk1S8z3vx+9c6rG-Xo_kUiAzuOWpqOR4cQ@mail.gmail.com>
I have an update on netperf-cstate-small-cross-socket results.
Reported performance degradation of 2.5% for the UDP stream throughput
and 0.6% for the TCP throughput is for message size of 16kB. For
smaller message sizes, the performance drop is higher - up to 5% for
UDP throughput for a message size of 64B. See the numbers below [1]
We still think that it's acceptable given the gains in other
situations (this is again compared to 5.7 vanilla) :
* solved the performance drop upto 20% with single instance
SPECjbb2005 benchmark on 8 NUMA node servers (particularly on AMD EPYC
Rome systems) => this performance drop was INCREASING with higher
threads counts (10% for 16 threads and 20 % for 32 threads)
* solved the performance drop upto 50% for low load scenarios
(SPECjvm2008 and NAS)
[1]
Hillf's patch compared to 5.7 (rc4) vanilla:
TCP throughput
Message size (B)
64 -2.6%
128 -2.3%
256 -2.6%
1024 -2.7%
2048 -2.2%
3312 -2.4%
4096 -1.1%
8192 -0.4%
16384 -0.6%
UDP throughput
64 -5.0%
128 -3.0%
256 -3.0%
1024 -3.1%
2048 -3.3%
3312 -3.5%
4096 -4.0%
8192 -3.3%
16384 -2.6%
On Wed, May 20, 2020 at 3:58 PM Jirka Hladky <jhladky@redhat.com> wrote:
>
> Hi Hillf, Mel and all,
>
> thanks for the patch! It has produced really GOOD results.
>
> 1) It has fixed performance problems with 5.7 vanilla kernel for
> single-tenant workload and low system load scenarios, without
> performance degradation for the multi-tenant tasks. It's producing the
> same results as the previous proof-of-concept patch where
> adjust_numa_imbalance function was modified to be a no-op (returning
> the same value of imbalance as it gets on the input).
>
> 2) We have also added Mel's netperf-cstate-small-cross-socket test to
> our test battery:
> https://github.com/gormanm/mmtests/blob/master/configs/config-network-netperf-cstate-small-cross-socket
>
> Mel told me that he had seen significant performance improvements with
> 5.7 over 5.6 for the netperf-cstate-small-cross-socket scenario.
>
> Out of 6 different patches we have tested, your patch has performed
> the best for this scenario. Compared to vanilla, we see minimal
> performance degradation of 2.5% for the udp stream throughput and 0.6%
> for the tcp throughput. The testing was done on a dual-socket system
> with Gold 6132 CPU.
>
> @Mel - could you please test Hillf's patch with your full testing
> suite? So far, it looks very promising, but I would like to check the
> patch thoroughly to make sure it does not hurt performance in other
> areas.
>
> Thanks a lot!
> Jirka
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, May 19, 2020 at 6:32 AM Hillf Danton <hdanton@sina.com> wrote:
> >
> >
> > Hi Jirka
> >
> > On Mon, 18 May 2020 16:52:52 +0200 Jirka Hladky wrote:
> > >
> > > We have compared it against kernel with adjust_numa_imbalance disabled
> > > [1], and both kernels perform at the same level for the single-tenant
> > > jobs, but the proposed patch is bad for the multitenancy mode. The
> > > kernel with adjust_numa_imbalance disabled is a clear winner here.
> >
> > Double thanks to you for the tests!
> >
> > > We would be very interested in what others think about disabling
> > > adjust_numa_imbalance function. The patch is bellow. It would be great
> >
> > A minute...
> >
> > > to collect performance results for different scenarios to make sure
> > > the results are objective.
> >
> > I don't have another test case but a diff trying to confine the tool
> > in question back to the hard-coded 2's field.
> >
> > It's used in the first hunk below to detect imbalance before migrating
> > a task, and a small churn of code is added at another call site when
> > balancing idle CPUs.
> >
> > Thanks
> > Hillf
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1916,20 +1916,26 @@ static void task_numa_find_cpu(struct ta
> > * imbalance that would be overruled by the load balancer.
> > */
> > if (env->dst_stats.node_type == node_has_spare) {
> > - unsigned int imbalance;
> > - int src_running, dst_running;
> > + unsigned int imbalance = 2;
> >
> > - /*
> > - * Would movement cause an imbalance? Note that if src has
> > - * more running tasks that the imbalance is ignored as the
> > - * move improves the imbalance from the perspective of the
> > - * CPU load balancer.
> > - * */
> > - src_running = env->src_stats.nr_running - 1;
> > - dst_running = env->dst_stats.nr_running + 1;
> > - imbalance = max(0, dst_running - src_running);
> > - imbalance = adjust_numa_imbalance(imbalance, src_running);
> > + //No imbalance computed without spare capacity
> > + if (env->dst_stats.node_type != env->src_stats.node_type)
> > + goto check_imb;
> > +
> > + imbalance = adjust_numa_imbalance(imbalance,
> > + env->src_stats.nr_running);
> > +
> > + //Do nothing without imbalance
> > + if (!imbalance) {
> > + imbalance = 2;
> > + goto check_imb;
> > + }
> > +
> > + //Migrate task if it's likely to grow balance
> > + if (env->dst_stats.nr_running + 1 < env->src_stats.nr_running)
> > + imbalance = 0;
> >
> > +check_imb:
> > /* Use idle CPU if there is no imbalance */
> > if (!imbalance) {
> > maymove = true;
> > @@ -9011,12 +9017,13 @@ static inline void calculate_imbalance(s
> > env->migration_type = migrate_task;
> > env->imbalance = max_t(long, 0, (local->idle_cpus -
> > busiest->idle_cpus) >> 1);
> > - }
> >
> > - /* Consider allowing a small imbalance between NUMA groups */
> > - if (env->sd->flags & SD_NUMA)
> > - env->imbalance = adjust_numa_imbalance(env->imbalance,
> > - busiest->sum_nr_running);
> > + /* Consider allowing a small imbalance between NUMA groups */
> > + if (env->sd->flags & SD_NUMA &&
> > + local->group_type == busiest->group_type)
> > + env->imbalance = adjust_numa_imbalance(env->imbalance,
> > + busiest->sum_nr_running);
> > + }
> >
> > return;
> > }
> > --
> >
>
>
> --
> -Jirka
--
-Jirka
next prev parent reply other threads:[~2020-05-20 16:02 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-24 9:52 [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Mel Gorman
2020-02-24 9:52 ` [PATCH 01/13] sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression Mel Gorman
2020-02-24 9:52 ` [PATCH 02/13] sched/numa: Trace when no candidate CPU was found on the preferred node Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 03/13] sched/numa: Distinguish between the different task_numa_migrate failure cases Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] sched/numa: Distinguish between the different task_numa_migrate() " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 04/13] sched/fair: Reorder enqueue/dequeue_task_fair path Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 9:52 ` [PATCH 05/13] sched/numa: Replace runnable_load_avg by load_avg Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 9:52 ` [PATCH 06/13] sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 07/13] sched/pelt: Remove unused runnable load average Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 9:52 ` [PATCH 08/13] sched/pelt: Add a new runnable average signal Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 16:01 ` Valentin Schneider
2020-02-24 16:34 ` Mel Gorman
2020-02-25 8:23 ` Vincent Guittot
2020-02-24 9:52 ` [PATCH 09/13] sched/fair: Take into account runnable_avg to classify group Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 9:52 ` [PATCH 10/13] sched/numa: Prefer using an idle cpu as a migration target instead of comparing tasks Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] sched/numa: Prefer using an idle CPU " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 11/13] sched/numa: Find an alternative idle CPU if the CPU is part of an active NUMA balance Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 12/13] sched/numa: Bias swapping tasks based on their preferred node Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 9:52 ` [PATCH 13/13] sched/numa: Stop an exhastive search if a reasonable swap candidate or idle CPU is found Mel Gorman
2020-02-24 15:20 ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 15:16 ` [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Ingo Molnar
2020-02-25 11:59 ` Mel Gorman
2020-02-25 13:28 ` Vincent Guittot
2020-02-25 14:24 ` Mel Gorman
2020-02-25 14:53 ` Vincent Guittot
2020-02-27 9:09 ` Ingo Molnar
2020-03-09 19:12 ` Phil Auld
2020-03-09 20:36 ` Mel Gorman
2020-03-12 9:54 ` Mel Gorman
2020-03-12 12:17 ` Jirka Hladky
[not found] ` <CAE4VaGA4q4_qfC5qe3zaLRfiJhvMaSb2WADgOcQeTwmPvNat+A@mail.gmail.com>
2020-03-12 15:56 ` Mel Gorman
2020-03-12 17:06 ` Jirka Hladky
[not found] ` <CAE4VaGD8DUEi6JnKd8vrqUL_8HZXnNyHMoK2D+1-F5wo+5Z53Q@mail.gmail.com>
2020-03-12 21:47 ` Mel Gorman
2020-03-12 22:24 ` Jirka Hladky
2020-03-20 15:08 ` Jirka Hladky
[not found] ` <CAE4VaGC09OfU2zXeq2yp_N0zXMbTku5ETz0KEocGi-RSiKXv-w@mail.gmail.com>
2020-03-20 15:22 ` Mel Gorman
2020-03-20 15:33 ` Jirka Hladky
[not found] ` <CAE4VaGBGbTT8dqNyLWAwuiqL8E+3p1_SqP6XTTV71wNZMjc9Zg@mail.gmail.com>
2020-03-20 16:38 ` Mel Gorman
2020-03-20 17:21 ` Jirka Hladky
2020-05-07 15:24 ` Jirka Hladky
2020-05-07 15:54 ` Mel Gorman
2020-05-07 16:29 ` Jirka Hladky
2020-05-07 17:49 ` Phil Auld
[not found] ` <20200508034741.13036-1-hdanton@sina.com>
2020-05-18 14:52 ` Jirka Hladky
[not found] ` <20200519043154.10876-1-hdanton@sina.com>
2020-05-20 13:58 ` Jirka Hladky
2020-05-20 16:01 ` Jirka Hladky [this message]
2020-05-21 11:06 ` Mel Gorman
[not found] ` <20200521140931.15232-1-hdanton@sina.com>
2020-05-21 16:04 ` Mel Gorman
[not found] ` <20200522010950.3336-1-hdanton@sina.com>
2020-05-22 11:05 ` Mel Gorman
2020-05-08 9:22 ` Mel Gorman
2020-05-08 11:05 ` Jirka Hladky
[not found] ` <CAE4VaGC_v6On-YvqdTwAWu3Mq4ofiV0pLov-QpV+QHr_SJr+Rw@mail.gmail.com>
2020-05-13 14:57 ` Jirka Hladky
2020-05-13 15:30 ` Mel Gorman
2020-05-13 16:20 ` Jirka Hladky
2020-05-14 9:50 ` Mel Gorman
[not found] ` <CAE4VaGCGUFOAZ+YHDnmeJ95o4W0j04Yb7EWnf8a43caUQs_WuQ@mail.gmail.com>
2020-05-14 10:08 ` Mel Gorman
2020-05-14 10:22 ` Jirka Hladky
2020-05-14 11:50 ` Mel Gorman
2020-05-14 13:34 ` Jirka Hladky
2020-05-14 15:31 ` Peter Zijlstra
2020-05-15 8:47 ` Mel Gorman
2020-05-15 11:17 ` Peter Zijlstra
2020-05-15 13:03 ` Mel Gorman
2020-05-15 13:12 ` Peter Zijlstra
2020-05-15 13:28 ` Peter Zijlstra
2020-05-15 14:24 ` Peter Zijlstra
2020-05-21 10:38 ` Mel Gorman
2020-05-21 11:41 ` Peter Zijlstra
2020-05-22 13:28 ` Mel Gorman
2020-05-22 14:38 ` Peter Zijlstra
2020-05-15 11:28 ` Peter Zijlstra
2020-05-15 12:22 ` Mel Gorman
2020-05-15 12:51 ` Peter Zijlstra
2020-05-15 14:43 ` Jirka Hladky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAE4VaGBsjVYc0kOXjm8OgRQgg73rUcyovMAiqcTO7VhbOhxUFw@mail.gmail.com \
--to=jhladky@redhat.com \
--cc=aokuliar@redhat.com \
--cc=bgray@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dshaks@redhat.com \
--cc=hdanton@sina.com \
--cc=jmario@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=kkolakow@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).