linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jirka Hladky <jhladky@redhat.com>
To: Hillf Danton <hdanton@sina.com>, Mel Gorman <mgorman@suse.de>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"bsegall@google.com" <bsegall@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Phil Auld <pauld@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	"kkolakow@redhat.com" <kkolakow@redhat.com>,
	Jiri Vozar <jvozar@redhat.com>
Subject: Re: [PATCH] sched/fair: use dst group while checking imbalance for NUMA balancer
Date: Thu, 10 Sep 2020 23:50:04 +0200	[thread overview]
Message-ID: <CAE4VaGDVmAmSY1pkiDPGUm=F_0aTtaqjC7D11_ExCpSJ_Mhhmg@mail.gmail.com> (raw)
In-Reply-To: <20200908010717.12436-1-hdanton@sina.com>

Hi Hilf and Mel,

thanks a lot for bringing this to my attention. We have tested the
proposed patch and we are getting excellent results so far!

1) Threads stability has improved a lot. We see much fewer threads
migrations. Check for example this heatmap based on the mpstat results
collected while running sp.C test from the NAS benchmark. sp.C was run
with 16 threads on a dual-socket AMD 7351 server with 8 NUMA nodes:
5.9 vanilla https://drive.google.com/file/d/10rojhTSQUSu-1aGQi-srr99SmVQ3Llgo/view?usp=sharing
5.9 with the patch
https://drive.google.com/file/d/1eZdTBWvWMNvdvXqitwAlcKZ7gb-ySQUl/view?usp=sharing

The heatmaps are generated from the mpstat output (there are 5
different runs at one picture). We collect CPU utilization every 5
seconds. Lighter color corresponds to lower CPU utilization. Light
color means that the thread may have run on different CPUs during that
5 seconds. Solid straight lines, on the other hand, mean that thread
was running on the same CPU all the time. The difference is striking.

We see significantly fewer threads migrations across many different
tests - NAS, SPECjbb2005, SPECjvm2008

2) We see also performance improvement in terms of runtime, especially
at low load scenarios (number of threads being roughly equal to the 2*
number of NUMA nodes). It fixes the performance drop which we see
since 5.7 kernel, discussed for example here:
https://lore.kernel.org/lkml/CAE4VaGB7+sR1nf3Ux8W=hgN46gNXRYr0uBWJU0oYnk7h00Y_dw@mail.gmail.com/

The biggest improvements are for the NAS and the SPECjvm2008
benchmarks (typically between 20-50%). SPECjbb2005 shows also
improvements, around 10%. This is again on NUMA servers at the low
utilization. You can find snapshots of some results here:
https://drive.google.com/drive/folders/1k3Gb-vlI7UjPZZcLkoL2W2VB_zqxIJ3_?usp=sharing

@Mel - could you please test the proposed patch? I know you have good
coverage for the inter-process communication benchmarks which may show
different behavior than benchmarks which we use. I hope that fewer
threads migrations might show positive effects also for these tests.
Please give it a try.

Thanks a lot!
Jirka


On Tue, Sep 8, 2020 at 3:07 AM Hillf Danton <hdanton@sina.com> wrote:
>
>
> On Mon, 7 Sep 2020 18:01:06 +0530 Srikar Dronamraju wrote:
> > > > On Mon, Sep 07, 2020 at 07:27:08PM +1200, Barry Song wrote:
> > > > > Something is wrong. In find_busiest_group(), we are checking if src has
> > > > > higher load, however, in task_numa_find_cpu(), we are checking if dst
> > > > > will have higher load after balancing. It seems it is not sensible to
> > > > > check src.
> > > > > It maybe cause wrong imbalance value, for example, if
> > > > > dst_running = env->dst_stats.nr_running + 1 results in 3 or above, and
> > > > > src_running = env->src_stats.nr_running - 1 results in 1;
> > > > > The current code is thinking imbalance as 0 since src_running is smaller
> > > > > than 2.
> > > > > This is inconsistent with load balancer.
> > > > >
>
> Hi Srikar and Barry
> >
> > I have observed the similar behaviour what Barry Song has documented with a
> > simple ebizzy with less threads on a 2 node system
>
> Thanks for your testing.
> >
> > ebizzy -t 6 -S 100
> >
> > We see couple of ebizzy threads moving back and forth between the 2 nodes
> > because of numa balancer and load balancer trying to do the exact opposite.
> >
> > However with Barry's patch, couple of tests regress heavily. (Any numa
> > workload that has shared numa faults).
> > For example:
> > perf bench numa mem --no-data_rand_walk -p 1 -t 6 -G 0 -P 3072 -T 0 -l 50 -c
> >
> > I also don't understand the rational behind checking for dst_running in numa
> > balancer path. This almost means no numa balancing in lightly loaded scenario.
> >
> > So agree with Mel that we should probably test more scenarios before
> > we accept this patch.
>
> Take a look at Jirka's work [1] please if you have any plan to do some
> more tests.
>
> [1] https://lore.kernel.org/lkml/CAE4VaGB7+sR1nf3Ux8W=hgN46gNXRYr0uBWJU0oYnk7h00Y_dw@mail.gmail.com/
> >
> > > >
> > > > It checks the conditions if the move was to happen. Have you evaluated
> > > > this for a NUMA balancing load and confirmed it a) balances properly and
> > > > b) does not increase the scan rate trying to "fix" the problem?
> > >
> > > I think the original code was trying to check if the numa migration
> > > would lead to new imbalance in load balancer. In case src is A, dst is B, and
> > > both of them have nr_running as 2. A moves one task to B, then A
> > > will have 1, B will have 3. In load balancer, A will try to pull task
> > > from B since B's nr_running is larger than min_imbalance. But the code
> > > is saying imbalance=0 by finding A's nr_running is smaller than
> > > min_imbalance.
> > >
> > > Will share more test data if you need.
> > >
> > > >
> > > > --
> > > > Mel Gorman
> > > > SUSE Labs
> > >
> > > Thanks
> > > Barry
> >
> > --
> > Thanks and Regards
> > Srikar Dronamraju
>
>


-- 
-Jirka


  parent reply	other threads:[~2020-09-10 21:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07  7:27 [PATCH] sched/fair: use dst group while checking imbalance for NUMA balancer Barry Song
2020-09-07  9:27 ` Mel Gorman
2020-09-07  9:44   ` Song Bao Hua (Barry Song)
2020-09-07 10:44     ` Mel Gorman
2020-09-07 12:31     ` Srikar Dronamraju
     [not found]       ` <20200908010717.12436-1-hdanton@sina.com>
2020-09-10 21:50         ` Jirka Hladky [this message]
2020-09-21 11:02           ` Mel Gorman
2020-09-21 16:02             ` Jirka Hladky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE4VaGDVmAmSY1pkiDPGUm=F_0aTtaqjC7D11_ExCpSJ_Mhhmg@mail.gmail.com' \
    --to=jhladky@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=juri.lelli@redhat.com \
    --cc=jvozar@redhat.com \
    --cc=kkolakow@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).