linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Fleming <matt@codeblueprint.co.uk>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Juri Lelli <Juri.Lelli@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Brendan Jackman <brendan.jackman@arm.com>,
	Chris Redpath <Chris.Redpath@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: wake_wide mechanism clarification
Date: Fri, 30 Jun 2017 14:11:24 +0100	[thread overview]
Message-ID: <20170630131124.GB12077@codeblueprint.co.uk> (raw)
In-Reply-To: <20170630004912.GA2457@destiny>

On Thu, 29 Jun, at 08:49:13PM, Josef Bacik wrote:
> 
> It may be worth to try with schedbench and trace it to see how this turns out in
> practice, as that's the workload that generated all this discussion before.  I
> imagine generally speaking this works out properly.  The small regression I
> reported before was at low RPS, so we wouldn't be waking up as many tasks as
> often, so we would be returning 0 from wake_wide() and we'd get screwed.  This
> is where I think possibly dropping the slave < factor part of the test would
> address that, but I'd have to trace it to say for sure.  Thanks,

Just 2 weeks ago I was poking at wake_wide() because it's impacting
hackbench times now we're better at balancing on fork() (see commit
6b94780e45c1 ("sched/core: Use load_avg for selecting idlest group")).

What's happening is that occasionally the hackbench times will be
pretty large because the hackbench tasks are being pulled back and
forth across NUMA domains due to the wake_wide() logic.

Reproducing this issue does require a NUMA box with more CPUs than
hackbench tasks. I was using an 80-cpu 2 NUMA node box with 1
hackbench group (20 readers, 20 writers).

I did the following very quick hack,

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a1f5efa51dc7..c1bc1b0434bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5055,7 +5055,7 @@ static int wake_wide(struct task_struct *p)

        if (master < slave)
                swap(master, slave);
-       if (slave < factor || master < slave * factor)
+       if (master < slave * factor)
                return 0;
        return 1;
 }

Which produces the following results for the 1 group (40 tasks) on one
of SUSE's enterprise kernels:

hackbench-process-pipes
                            4.4.71                4.4.71
                          patched+patched+-wake-wide-fix
Min      1        0.7000 (  0.00%)      0.8480 (-21.14%)
Amean    1        1.0343 (  0.00%)      0.9073 ( 12.28%)
Stddev   1        0.2373 (  0.00%)      0.0447 ( 81.15%)
CoeffVar 1       22.9447 (  0.00%)      4.9300 ( 78.51%)
Max      1        1.2270 (  0.00%)      0.9560 ( 22.09%)

You'll see that the minimum value is worse with my change, but the
maximum is much better.

So the current wake_wide() code does help sometimes, but it also hurts
sometimes too.

I'm happy to gather performance data for any code suggestions.

      parent reply	other threads:[~2017-06-30 13:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  0:19 wake_wide mechanism clarification Joel Fernandes
2017-06-30  0:49 ` Josef Bacik
2017-06-30  3:04   ` Joel Fernandes
2017-06-30 14:28     ` Josef Bacik
2017-06-30 17:02       ` Mike Galbraith
2017-06-30 17:55         ` Josef Bacik
2017-08-03 10:53           ` Brendan Jackman
2017-08-03 13:15             ` Josef Bacik
2017-08-03 15:05               ` Brendan Jackman
2017-08-09 21:22                 ` Atish Patra
2017-08-10  9:48                   ` Brendan Jackman
2017-08-10 17:41                     ` Atish Patra
2017-07-29  8:01         ` Joel Fernandes
2017-07-29  8:13           ` Joel Fernandes
2017-08-02  8:26             ` Michael Wang
2017-08-03 23:48               ` Joel Fernandes
2017-07-29 15:07           ` Mike Galbraith
2017-07-29 20:19             ` Joel Fernandes
2017-07-29 22:28               ` Joel Fernandes
2017-07-29 22:41                 ` Joel Fernandes
2017-07-31 12:21                   ` Josef Bacik
2017-07-31 13:42                     ` Mike Galbraith
2017-07-31 14:48                       ` Josef Bacik
2017-07-31 17:23                         ` Mike Galbraith
2017-07-31 16:21                     ` Joel Fernandes
2017-07-31 16:42                       ` Josef Bacik
2017-07-31 17:55                         ` Joel Fernandes
2017-06-30  3:11   ` Mike Galbraith
2017-06-30 13:11   ` Matt Fleming [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630131124.GB12077@codeblueprint.co.uk \
    --to=matt@codeblueprint.co.uk \
    --cc=Chris.Redpath@arm.com \
    --cc=Juri.Lelli@arm.com \
    --cc=brendan.jackman@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).