From: Mel Gorman <mgorman@techsingularity.net>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>, Phil Auld <pauld@redhat.com>,
Peter Puhov <peter.puhov@linaro.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Robert Foley <robert.foley@linaro.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Jirka Hladky <jhladky@redhat.com>
Subject: Re: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
Date: Wed, 4 Nov 2020 10:47:56 +0000 [thread overview]
Message-ID: <20201104104755.GC3371@techsingularity.net> (raw)
In-Reply-To: <CAKfTPtAjhv8JafvZFR8_UUfDM2MUzVGMPXVBx1zynhPXJ_oh3w@mail.gmail.com>
On Wed, Nov 04, 2020 at 11:06:06AM +0100, Vincent Guittot wrote:
> >
> > Hackbench failed to run because I typo'd the configuration. Kernel build
> > benchmark and git test suite both were inconclusive for 5.10-rc2
> > (neutral results) although the showed 10-20% gain for kernbench and 24%
> > gain in git test suite by reverting in 5.9.
> >
> > The gitsource test was interesting for a few reasons. First, the big
> > difference between 5.9 and 5.10 is that the workload is mostly concentrated
> > on one NUMA node. mpstat shows that 5.10-rc2 uses all of the CPUs on one
> > node lightly. Reverting the patch shows that far fewer CPUs are used at
> > a higher utilisation -- not particularly high utilisation because of the
> > nature of the workload but noticable. i.e. gitsource with the revert
> > packs the workload onto fewer CPUs. The same holds for fork_test --
> > reverting packs the workload onto fewer CPUs with higher utilisation on
> > each of them. Generally this plays well with cpufreq without schedutil
> > using fewer CPUs means the CPU is likely to reach higher frequencies.
>
> Which cpufreq governor are you using ?
>
Uhh, intel_pstate with ondemand .... which is surprising, I would have
expected powersave. I'd have to look closer at what happened there. It
might be a variation of the Kconfig mess selecting the wrong governors when
"yes '' | make oldconfig" is used.
> >
> > While it's possible that some other factor masked the impact of the patch,
> > the fact it's neutral for two workloads in 5.10-rc2 is suspicious as it
> > indicates that if the patch was implemented against 5.10-rc2, it would
> > likely not have been merged. I've queued the tests on the remaining
> > machines to see if something more conclusive falls out.
>
> I don't think that the goal of the patch is stressed by those benchmarks.
> I typically try to optimize the sequence:
> 1-fork a lot of threads that immediately wait
> 2-wake up all threads simultaneously to run in parallel
> 3-wait the end of all threads
>
Out of curiousity, have you a stock benchmark that does this with some
associated metric? sysbench-threads wouldn't do it. While I know of at
least one benchmark that *does* exhibit this pattern, it's a Real Workload
that cannot be shared (so I can't discuss it) and it's *complex* with a
minimal kernel footprint so analysing it is non-trivial.
I could develop one on my own but if you had one already, I'd wire it into
mmtests and add it to the stock collection of scheduler loads. schbench
*might* match what you're talking about but I'd rather not guess.
schbench is also more of a latency wakeup benchmark than it is a throughput
one. Latency ones tend to be more important but optimising purely for
wakeup-latency also tends to kick other workloads into a hole.
> Without the patch all newly forked threads were packed on few CPUs
> which were already idle when the next fork happened. Then the spreads
> were spread on CPUs at wakeup in the LLC but they have to wait for a
> LB to fill other sched domain
>
Which is fair enough but it's a tradeoff because there are plenty of
workloads that fork/exec and do something immediately and this is not
the first time we've had to tradeoff between workloads.
The other aspect I find interesting is that we get slightly burned by
the initial fork path because of this thing;
/*
* Otherwise, keep the task on this node to stay close
* its wakeup source and improve locality. If there is
* a real need of migration, periodic load balance will
* take care of it.
*/
if (local_sgs.idle_cpus)
return NULL;
For a workload that creates a lot of new threads that go idle and then
wakeup (think worker pool threads that receive requests at unpredictable
times), it packs one node too tightly when the threads wakeup -- it's
also visible from page fault microbenchmarks that scale the number of
threads. It's a vaguely similar class of problem but the patches are
taking very different approaches.
It'd been in my mind to consider reconciling that chunk with the
adjust_numa_imbalance but had not gotten around to seeing how it should
be reconciled without introducing another regression.
The longer I work on the scheduler, the more I feel it's like juggling
while someone is firing arrows at you :D .
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2020-11-04 10:48 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-14 12:59 [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal peter.puhov
2020-07-22 9:12 ` [tip: sched/core] " tip-bot2 for Peter Puhov
2020-11-02 10:50 ` [PATCH v1] " Mel Gorman
2020-11-02 11:06 ` Vincent Guittot
2020-11-02 14:44 ` Phil Auld
2020-11-02 16:52 ` Mel Gorman
2020-11-04 9:42 ` Mel Gorman
2020-11-04 10:06 ` Vincent Guittot
2020-11-04 10:47 ` Mel Gorman [this message]
2020-11-04 11:34 ` Vincent Guittot
2020-11-06 12:03 ` Mel Gorman
2020-11-06 13:33 ` Vincent Guittot
2020-11-06 16:00 ` Mel Gorman
2020-11-06 16:06 ` Vincent Guittot
2020-11-06 17:02 ` Mel Gorman
2020-11-09 15:24 ` Phil Auld
2020-11-09 15:38 ` Mel Gorman
2020-11-09 15:47 ` Phil Auld
2020-11-09 15:49 ` Vincent Guittot
2020-11-10 14:05 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104104755.GC3371@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=jhladky@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peter.puhov@linaro.org \
--cc=peterz@infradead.org \
--cc=robert.foley@linaro.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).