All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Joel Fernandes <joelaf@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Juri Lelli <Juri.Lelli@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Brendan Jackman <brendan.jackman@arm.com>,
	Chris Redpath <Chris.Redpath@arm.com>
Subject: Re: wake_wide mechanism clarification
Date: Fri, 30 Jun 2017 10:28:17 -0400	[thread overview]
Message-ID: <20170630142815.GA9743@destiny> (raw)
In-Reply-To: <CAJWu+opFTvnGnYyRo9eYZJUxiHdoMCMoadUvnX3p=6ypwWX_Jw@mail.gmail.com>

On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote:
> Hi Josef,
> 
> Thanks a lot for your reply, :-)
> 
> On Thu, Jun 29, 2017 at 5:49 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > Because we are trying to detect the case that the master is waking many
> > different processes, and the 'slave' processes are only waking up the
> > master/some other specific processes to determine if we don't care about cache
> > locality.
> >
> >>
> >> The code here is written as:
> >>
> >> if (slave < factor || master < slave * factor)
> >>    return 0;
> >>
> >> However I think we should just do (with my current and probably wrong
> >> understanding):
> >>
> >> if (slave < factor || master < factor)
> >>     return 0;
> >>
> >
> > Actually I think both are wrong, but I need Mike to weigh in.  In my example
> > above we'd return 0, because the 'producer' will definitely have a wakee_flip of
> > ridiculous values, but the 'consumer' could essentially have a wakee_flip of 1,
> > just the master to tell it that it's done.  I _suppose_ in practice you have a
> > lock or something so the wakee_flip isn't going to be strictly 1, but some
> > significantly lower value than master.  I'm skeptical of the slave < factor
> > test, I think it's too high of a bar in the case where cache locality doesn't
> > really matter, but the master < slave * factor makes sense, as slave is going to
> > be orders of magnitude lower than master.
> 
> That makes sense that we multiply slave's flips by a factor because
> its low, but I still didn't get why the factor is chosen to be
> llc_size instead of something else for the multiplication with slave
> (slave * factor). I know slave's flips are probably low and need to be
> higher, but why its multiplied with llc_size than some other number
> (in other words - why we are tying the size of a NUMA node or a
> cluster size with the number of wake-ups that the slave made)? Is this
> because of an assumption that a master is expected to evenly
> distribute work amongs CPUs within its node or something like that?
> 

Yeah I don't know why llc_size was chosen, but I've been thinking about this on
and off since last night and I don't really know what the right thing to use
would be.  We need some arbitrary number, and any arbitrary number is going to
be wrong for one workload when its fine for another workload.

What we really want is to know how many different tasks we could potentially
wake up to compare against, and that's kind of impossible.  If we did it based
soley on actual wakee_flips values we could end up with the case that 1 tasks
that has 2 worker tasks always being wake_wide, and that may not be as helpful.
With processes that are threaded then sure we have a readily available number to
use, but for things that are discrete processes that talk over say a pipe or
shared memory that's going to be unpossible to tease out.  I haven't had my Mt.
Dew this morning so if you have a better idea for a factor I'm all ears.

> More over, this case is only for when slave wakeups are far lower than
> the master. But, what about the case where slave wakes up are greater
> than the factor but approximately equal or around the same as the
> masters'. Then, it sounds like (master < slave * factor) can return
> true. In that case wake_wide() will be = 0. That sounds like a bad
> thing to do I think - pull a busy slave onto a busy master.
> 

This I think is a flaw in our load balancing assumptions.  I've been drowning in
this code recently because of a cgroups imbalance problem.  We don't really
propagate load well for processes that are on the rq but haven't run yet, so I
feel this plays into this problem where we think the cpu we're pulling to has
plenty of capacity when in reality it doesnt.

I'm going to tool around with this logic some this morning and see if I can make
it a little bit smarter, I'll let you know how it goes.  Thanks,

Josef

  reply	other threads:[~2017-06-30 14:28 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  0:19 wake_wide mechanism clarification Joel Fernandes
2017-06-30  0:49 ` Josef Bacik
2017-06-30  3:04   ` Joel Fernandes
2017-06-30 14:28     ` Josef Bacik [this message]
2017-06-30 17:02       ` Mike Galbraith
2017-06-30 17:55         ` Josef Bacik
2017-08-03 10:53           ` Brendan Jackman
2017-08-03 13:15             ` Josef Bacik
2017-08-03 15:05               ` Brendan Jackman
2017-08-09 21:22                 ` Atish Patra
2017-08-10  9:48                   ` Brendan Jackman
2017-08-10 17:41                     ` Atish Patra
2017-07-29  8:01         ` Joel Fernandes
2017-07-29  8:13           ` Joel Fernandes
2017-08-02  8:26             ` Michael Wang
2017-08-03 23:48               ` Joel Fernandes
2017-07-29 15:07           ` Mike Galbraith
2017-07-29 20:19             ` Joel Fernandes
2017-07-29 22:28               ` Joel Fernandes
2017-07-29 22:41                 ` Joel Fernandes
2017-07-31 12:21                   ` Josef Bacik
2017-07-31 13:42                     ` Mike Galbraith
2017-07-31 14:48                       ` Josef Bacik
2017-07-31 17:23                         ` Mike Galbraith
2017-07-31 16:21                     ` Joel Fernandes
2017-07-31 16:42                       ` Josef Bacik
2017-07-31 17:55                         ` Joel Fernandes
2017-06-30  3:11   ` Mike Galbraith
2017-06-30 13:11   ` Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630142815.GA9743@destiny \
    --to=josef@toxicpanda.com \
    --cc=Chris.Redpath@arm.com \
    --cc=Juri.Lelli@arm.com \
    --cc=brendan.jackman@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.